Data Engineer 5(Big Data Engineer)
Position: Data Engineer 5(Big Data Engineer)
Location: Plano, TX
Duration: 6 Months Contract
Responsible for completing our transition into fully automated operational reports across different functions within Retail and for bringing our Retail Big Data capabilities to the next level by designing and implementing a new analytics governance model, with emphasis on architecting consistent root cause analysis procedures resulting in enhanced operational and customer engagement results.
Big Data Engineers serve as the backbone of the Strategic Analytics organization, ensuring both the reliability and applicability of the team’s data products to the entire Client organization. They have extensive experience with ETL design, coding, and testing patterns as well as engineering software platforms and large-scale data infrastructures. Big Data Engineers have the capability to architect highly scalable end-to-end pipeline using different open source tools, including building and operationalizing high-performance algorithms.
Big Data Engineers understand how to apply technologies to solve big data problems with expert knowledge in programming languages like Java, Python, Linux, PHP, Hive, Impala, and Spark. Extensive experience working with both 1) big data platforms and 2) real-time / streaming deliver of data is essential.
Big data engineers implement complex big data projects with a focus on collecting, parsing, managing, analyzing, and visualizing large sets of data to turn information into actionable deliverables across customer-facing platforms. They have a strong aptitude to decide on the needed hardware and software design and can guide the development of such designs through both proof of concepts and complete implementations.
- Design and implement data modeling.
- Work with multiple teams to define architectural solutions for data ingestion and distribution.
- Work with other data engineers to build solutions via AWS cloud big data platform.
- Analyze and define internal user requirements for data exchange.
- Integrate functional, non-functional requirements and technical requirements into detailed design.
- Loading from disparate data sets. by leveraging various big data technology e.g. Kafka
- Pre-processing data using Hive, Impala, Spark, and Pig.
- Hadoop technical development and implementation.
- Support and steer project stakeholders through efficient design, development, testing, and deploy phases.
- Develop, manage and communicate current state and future state architectural models, keeping them aligned to changing business needs.
- Ensure necessary quality assurance processes are in implemented and followed.
- Ensure solutions are of high quality, are high performance and are scalable.
- Maintain security and data privacy in an environment secured using Kerberos and LDAP
- Following and contributing best engineering practice for source control, release management, deployment etc
- Production support, job scheduling/monitoring, ETL data quality, data freshness reporting
- Support UAT (User Acceptance testing) for implemented solutions.
- 10+ years of experience leading solutions architecture and/or enterprise architecture for data intensive, scalable, high throughput integration platforms by leveraging knowledge of industry standards, latest data integration, storage and distribution technologies and engineering practice.
- Strong SQL end to end experience.
- Proven understanding and hands on experience with Hadoop, Hive, Pig, Impala, and Spark. writing Unix, Bash, Korn shell script.
- 5+ years of experience with Python, statistical analysis and data mining techniques.
- 5-8 years of demonstrated experience and success in data modeling.
- Extensive experience in complete life cycle implementation of big data cloud ecosystem in AWS, components such as EC2, S3, SNS, SQS.
- Understanding of query algorithms and optimization in Cloud environment, high-speed querying using in-memory technologies such as Spark.
- Understand how to develop code in an environment secured using a local KDC and OpenLDAP.
- Extensive experience handling diverse file formats.
- Knowledge and ability to implement workflow/schedulers within Oozie
- Excellent analytical skills, critical thinking, and communication skills.
- Demonstrated to have a strong, analytical and data-driven mind set.
- Capable of working with details and also able to see the big picture.
- B.S. or M.S. in Computer Science or Engineering.