Data Engineering Expert

3 CUBED BUSINESS CONSULTING PTE. LTD.
6 days ago
Posted date6 days ago
N/A
Minimum levelN/A
1. Data Engineering & Platform Knowledge (Must)
• Strong understanding of Hadoop ecosystem: HDFS, Hive, Impala, Oozie, Sqoop, Spark (on YARN).
• Experience in data migration strategies (lift & shift, incremental, re-engineering pipelines).
• Knowledge of Databricks architecture (Workspaces, Unity Catalog, Clusters, Delta Lake, Workflows).
2. Testing & Validation (Preferred)
• Data reconciliation (source vs. target).
• Performance benchmarking.
• Automated test frameworks for ETL pipelines.
3. Databricks-Specific Expertise (Preferred)
• Delta Lake: ACID transactions, time travel, schema evolution, Z-ordering.
• Unity Catalog: Catalog/schema/table design, access control, lineage, tags.
• Workflows/Jobs: Orchestration, job clusters vs. all-purpose clusters.
• SQL Endpoints / Databricks SQL: Designing downstream consumption models.
• Performance Tuning: Partitioning, caching, adaptive query execution (AQE), photon runtime.
4. Migration & Data Movement (Preferred)
• Data migration from HDFS/Cloudera to cloud storage (ADLS/S3/GCS).
• Incremental ingestion techniques (Change Data Capture, Delta ingestion frameworks).
• Mapping Hive Metastore to Unity Catalog (metastore migration).
• Refactoring HiveQL/Impala SQL to Databricks SQL (syntax differences).
5. Security & Governance (Nice to have)
• Mapping Cloudera Ranger/SSO policies → Unity Catalog RBAC.
• Azure AD / AWS IAM integration with Databricks.
• Data encryption, masking, anonymization strategies.
• Service Principal setup & governance.
6. DevOps & Automation (Nice to have)
• Infrastructure as Code (Terraform for Databricks, Cloud storage, Networking).
• CI/CD for Databricks (GitHub Actions, Azure DevOps, Databricks Asset Bundles).
• Cluster policies & job automation.
• Monitoring & logging (Databricks system tables, cloud-native monitoring).
7. Cloud & Infra Skills (Nice to have)
• Strong knowledge of the target cloud (AWS/Azure/GCP):
o Storage (S3/ADLS/GCS).
o Networking (VNETs, Private Links, Security Groups).
o IAM & Key Management. 9. Soft Skills
• Ability to work with business stakeholders for data domain remapping.
• Strong documentation and governance mindset.
• Cross-team collaboration (infra, security, data, business).
• Strong understanding of Hadoop ecosystem: HDFS, Hive, Impala, Oozie, Sqoop, Spark (on YARN).
• Experience in data migration strategies (lift & shift, incremental, re-engineering pipelines).
• Knowledge of Databricks architecture (Workspaces, Unity Catalog, Clusters, Delta Lake, Workflows).
2. Testing & Validation (Preferred)
• Data reconciliation (source vs. target).
• Performance benchmarking.
• Automated test frameworks for ETL pipelines.
3. Databricks-Specific Expertise (Preferred)
• Delta Lake: ACID transactions, time travel, schema evolution, Z-ordering.
• Unity Catalog: Catalog/schema/table design, access control, lineage, tags.
• Workflows/Jobs: Orchestration, job clusters vs. all-purpose clusters.
• SQL Endpoints / Databricks SQL: Designing downstream consumption models.
• Performance Tuning: Partitioning, caching, adaptive query execution (AQE), photon runtime.
4. Migration & Data Movement (Preferred)
• Data migration from HDFS/Cloudera to cloud storage (ADLS/S3/GCS).
• Incremental ingestion techniques (Change Data Capture, Delta ingestion frameworks).
• Mapping Hive Metastore to Unity Catalog (metastore migration).
• Refactoring HiveQL/Impala SQL to Databricks SQL (syntax differences).
5. Security & Governance (Nice to have)
• Mapping Cloudera Ranger/SSO policies → Unity Catalog RBAC.
• Azure AD / AWS IAM integration with Databricks.
• Data encryption, masking, anonymization strategies.
• Service Principal setup & governance.
6. DevOps & Automation (Nice to have)
• Infrastructure as Code (Terraform for Databricks, Cloud storage, Networking).
• CI/CD for Databricks (GitHub Actions, Azure DevOps, Databricks Asset Bundles).
• Cluster policies & job automation.
• Monitoring & logging (Databricks system tables, cloud-native monitoring).
7. Cloud & Infra Skills (Nice to have)
• Strong knowledge of the target cloud (AWS/Azure/GCP):
o Storage (S3/ADLS/GCS).
o Networking (VNETs, Private Links, Security Groups).
o IAM & Key Management. 9. Soft Skills
• Ability to work with business stakeholders for data domain remapping.
• Strong documentation and governance mindset.
• Cross-team collaboration (infra, security, data, business).
JOB SUMMARY
Data Engineering Expert

3 CUBED BUSINESS CONSULTING PTE. LTD.
Singapore
6 days ago
N/A
Contract / Freelance / Self-employed
Data Engineering Expert