PySpark/Hadoop Data Engineer
3 days to apply

Capgemini Singapore PTE. LTD.
a month ago
Posted datea month ago
N/A
Minimum levelN/A
Human ResourcesJob category
Human Resources• Data Engineer should be able to understand the Business requirements, Functional and Technical requirements and should build effective data transformation jobs in Python, PySpark/SCALA, Python Framework.
• Should have strong hands-on working expertise in creating the optimized data pipelines in Pyspark/Python/Scala. Produce unit tests for Spark transformations and helper methods.
• Understand the complex transformation logic and translate and build data pipelines in Pyspark/Spark-SQL/Hive logic to ingest the data from source systems to Data Lake (Hive/Hbase/Parquet)/ Enterprise Data Domain tables.
• Work closely with Business Analysts team to review the test results and obtain sign off.
• Prepare necessary design/operations documentation for future usage.
• Perform peers Code quality review and be gatekeeper for quality checks.
• Hands-on coding, usually in a pair programming environment.
• Working in highly collaborative teams and building quality code.
• The candidate must exhibit a good understanding of data structures, data manipulation, distributed processing, application development, and automation.
Familiar with Oracle, Spark streaming, Kafka, ML.
• Have knowledge on RDMS concepts, hands on experience on PLSQL etc.
• To develop an application by using Hadoop tech stack and delivered effectively, efficiently, on-time, in-specification and in a cost-effective manner.
• Ensure smooth production deployments as per plan and post-production deployment verification.
• This Hadoop Developer will play a hands-on role to develop quality applications within the desired timeframes and resolving team queries.
Technical Requirements:
• Hadoop data engineer with total 4-6years of experience and should have strong experience in Hadoop, Spark, Pyspark, Scala , Hive, Spark-SQL, Python, Impala, CI/CD, Git, Jenkins, Agile Methodologies, DevOps, Cloudera Distribution.
• Strong Knowledge in data warehousing Methodology and Change Data Capture.
• Relevant 5+ years of Hadoop & Spark/Pyspark experience is mandatory.
• Good Knowledge and experience in any RDBMS database (MariaDB or SQL Server OR MySQL or Oracle ). Knowledge on stored procedures is an added advantage.
• Have exposure to TWS jobs for scheduling.
• Strong in enterprise data architectures and data models.
• Good experience in Core Banking, Finance domain.
• Exposure in AML Domain preferred, not mandatory.
• Should have strong hands-on working expertise in creating the optimized data pipelines in Pyspark/Python/Scala. Produce unit tests for Spark transformations and helper methods.
• Understand the complex transformation logic and translate and build data pipelines in Pyspark/Spark-SQL/Hive logic to ingest the data from source systems to Data Lake (Hive/Hbase/Parquet)/ Enterprise Data Domain tables.
• Work closely with Business Analysts team to review the test results and obtain sign off.
• Prepare necessary design/operations documentation for future usage.
• Perform peers Code quality review and be gatekeeper for quality checks.
• Hands-on coding, usually in a pair programming environment.
• Working in highly collaborative teams and building quality code.
• The candidate must exhibit a good understanding of data structures, data manipulation, distributed processing, application development, and automation.
Familiar with Oracle, Spark streaming, Kafka, ML.
• Have knowledge on RDMS concepts, hands on experience on PLSQL etc.
• To develop an application by using Hadoop tech stack and delivered effectively, efficiently, on-time, in-specification and in a cost-effective manner.
• Ensure smooth production deployments as per plan and post-production deployment verification.
• This Hadoop Developer will play a hands-on role to develop quality applications within the desired timeframes and resolving team queries.
Technical Requirements:
• Hadoop data engineer with total 4-6years of experience and should have strong experience in Hadoop, Spark, Pyspark, Scala , Hive, Spark-SQL, Python, Impala, CI/CD, Git, Jenkins, Agile Methodologies, DevOps, Cloudera Distribution.
• Strong Knowledge in data warehousing Methodology and Change Data Capture.
• Relevant 5+ years of Hadoop & Spark/Pyspark experience is mandatory.
• Good Knowledge and experience in any RDBMS database (MariaDB or SQL Server OR MySQL or Oracle ). Knowledge on stored procedures is an added advantage.
• Have exposure to TWS jobs for scheduling.
• Strong in enterprise data architectures and data models.
• Good experience in Core Banking, Finance domain.
• Exposure in AML Domain preferred, not mandatory.
JOB SUMMARY
PySpark/Hadoop Data Engineer

Capgemini Singapore PTE. LTD.
Singapore
a month ago
N/A
Full-time
PySpark/Hadoop Data Engineer
3 days to apply