DevOps Engineer / Site-Reliability Engineer
THIRD PARTY CONSULTING PTE. LTD.
12 days ago
Posted date12 days ago
N/A
Minimum levelN/A
Human ResourcesJob category
Human ResourcesKey Responsibilities
Cluster Operations & Management
Infrastructure Platform Development
High Availability & Reliability
Automation & Process Improvement
Required Qualifications
Experience & Education
Cloud & Infrastructure
Technical Skills
Advanced Networking (Preferred)
Cluster Operations & Management
- Manage and maintain container clusters (Kubernetes, Docker) and open-source component clusters (Kafka, Redis, Elasticsearch) across multiple business units
- Ensure optimal performance, scalability, and reliability of distributed systems
Infrastructure Platform Development
- Design, build, and enhance infrastructure operation platforms
- Develop and maintain systems for infrastructure management, CI/CD pipelines, monitoring/alerting, and centralized logging
- Drive platform standardization and automation initiatives
High Availability & Reliability
- Ensure maximum uptime for production services through proactive monitoring and incident response
- Continuously optimize service architecture, deployment strategies, and operational processes
- Implement and maintain SLA/SLO frameworks and reliability engineering practices
Automation & Process Improvement
- Lead the development of automated operations and maintenance systems
- Create self-service tools and workflows to improve team productivity
- Establish best practices for infrastructure such as code and configuration management
Required Qualifications
Experience & Education
- 2+ years of hands-on experience in Systems Operations, DevOps, or Site Reliability Engineering (SRE)
- Bachelor's degree in Computer Science, Engineering, or related technical field preferred
- Strong command of both English and Chinese for effective communication in a multicultural setting.
Cloud & Infrastructure
- Experience with public cloud platforms (AWS, Azure, or GCP) is highly valued
- Strong understanding of large-scale internet architecture and distributed systems
- Proven experience with infrastructure monitoring, logging, and observability tools
Technical Skills
- Proficiency in scripting and automation using Shell, Python, or similar languages
- Strong knowledge of containerization technologies (Kubernetes, Docker)
- Hands-on experience operating production-grade container clusters and managing CI/CD pipelines
- Strong familiarity with common infrastructure components: Nginx, MySQL, Redis, Kafka, Elasticsearch
Advanced Networking (Preferred)
- Experience with Service Mesh architectures, Cilium CNI, and eBPF technologies
- Understanding network security, load balancing, and traffic management
- Knowledge of cloud-native networking patterns and best practices
JOB SUMMARY
DevOps Engineer / Site-Reliability Engineer
THIRD PARTY CONSULTING PTE. LTD.
Singapore
12 days ago
N/A
Full-time
DevOps Engineer / Site-Reliability Engineer