Senior Data Engineer

Senior Data Engineer
CT, Stamford

Job Description


Harvey Nash is seeking a Senior Data Engineer to support client initiatives within the Media sector. This role is to be based out in Stamford, CT. 

My client is looking for a team to help with the following:

  • Build solutions for ingesting data in real-time from front end apps, transform and push them via an API to a CRM system (sailthru) for real-time and optimized marketing. The data will be pushed via a Kinesis stream and will require building for high volumes as well as for limited latency and high reliability. The email has to be triggered within 10 secs.
  • Build solutions to ingest data from various Publisher APIs like YouTube, Instagram, and Facebook but build in algorithms to account for quota limits, spikes in usage and so on. This will likely involve use of multi-threading and should have data quality and maintenance checks built in

Senior Software Engineer


  • Explore and discover new data sources and quickly familiarize with the available APIs or other data acquisition methods like web-scraping to ingest data
  • Build quick proof of concepts of new data sources to showcase data capabilities and help analytics team identify key metrics and dimensions
  • Design, develop and maintain data ingestion & integration pipelines from various sources which may include contacting primary or third party-data providers to resolve questions, inconsistencies, and/or obtain missing data
  • Design, implement and manage a near real-time ingestion & integration pipelines
  • Analyze data to identify outliers, missing, incomplete, and/or invalid data; Ensure accuracy of all data from source to final deliverable by creating automated quality checks
  • Evangelize an extremely high standard of code quality, system reliability, and performance.


  • Bachelor’s degree in Computer Science or Related Discipline
  • Minimum 8+ years of experience in backend programming languages
  • Minimum 4+ years of experience in APIs based development using Python
  • Experience in designing & building the secured, reliable and high-performance data pipeline using Python, Spark on AWS cloud
  • Experience in Python libraries such as Pandas and NumPy, SciPy, Flask, SQLAlchemy and/or Automation is a plus.
  • Experience in real-time data processing using Python, Spark and Spark-Streaming
  • Experience building and deploying RESTful services utilizing popular open source frameworks
  • Experience in developing solutions at scale to empower the business and support a wide variety of use cases, from experimental work to mission-critical production operations.
  • Experience in AWS Kinesis Stream Processing, EMR, Redshift, S3, Lambda
  • Experience in databases such as AWS Redshift, BigQuery, SQL Server or Oracle
  • Experience with multi-threading and asynchronous event-driven programming
  • Experience with high volume, high availability distributed systems
  • Experience in coming up with the viable solutions to tough engineering problems
  • Self-driven and proactive with the ability to work both independently and in groups
  • Knowledge of code versioning tools {{such as Git, Mercurial or SVN}}
  • Familiarity with the Sailthru APIs is a plus


Apply Now