Pyspark Developer

Pyspark Developer
CT, Stamford

Job Description

PySpark/ Big Data Integration Developer

The PySpark/Big Data Integration developer is the lead developer role in Data Engineering responsible for development platform of  the data management platform development. Reporting to the head of Data Engineering, this lead will manage a team of developers to build out the integration jobs using Spark and other Big Data/Hadoop framework. This role will be key to the rollout of the  Data platform and will partner with Data Analytics and other technology teams. Our environment is dynamic, fast-paced, and lots of fun.


Key Responsibilities:

  • Coordinate with cross-functional teams, testing teams and drive resolution of open items and issues
  • Customer-focused and work well in a team environment
  • Multi-task and work on multiple projects & prioritize correctly
  • Work with cross border technical team members
  • Interface with the business and analytics team
  • Drive test driven development approach across the data environments
  • Develop using a CI/CD framework setting up best practises where necessary



  • 8 + years experience in Data Warehousing in Media industry and consumer data
  • 3 + year of technology  experience of onshore and offshore reporting development team
  • 3 + years of Agile development
  • 3 + years Experience with Hadoop Ecosystem including Spark, Storm, HDFS, Hive, HBase and other NoSQL databases
  • BS in Computer Science or similar technical degree
  • Experience in developing Spark Streaming applications analyzing the data through Spark (conducted ETL processes and connected to different SQL and Redshift databases)
  • Experience in writing queries for moving data from HDFS to Hive and analyzing data
  • Understanding Partitions, Hive Query optimization, Bucketing etc.
  • Experience in Sqoop for moving data between RDBMS and HDFS
  • Strong understanding of programming paradigms such as distributed architectures and multi-threaded program design
  • Should be very strong in algorithms, collection framework and build high performance engine to handle large amount of data
  • Experience working in a Data warehouse environment and ETL is a huge plus


Apply Now