
Sr. Site Reliability Engineer
Snaphunt Pte Ltd| Date Posted: 31-Dec-2020
Save Job
Job Nature:
Permanent
Position Level:
Entry Level, Experienced
Job Category:
Qualification:
ITE/ NITEC/ Higher NITEC, Diploma, Bachelor's / Honours
Job Description
- Opportunity to work alongside & learn from international teammates
- Great work environment and positive culture
- Attractive perks and benefits
Our client is an app monetisation and advertising platform with multiple offices around the world. With innovative tools and solutions, they are a reliable partner of businesses seeking to accelerate growth, engagement and returns.
The Job
In this role, your primary focus is to improve the long-term health of the system and be responsible for :
- Maintaining the reliability of the platform & processes as well as automating re-occurring tasks
- Analysing systems based on data points to identify workloads critical to the business
- Collaborating with engineering and product teams to ensure success of system operation
- Monitoring system behavior to detect anomalies and resolve them in a timely manner
- Supporting the stack in the event of a failure
- Undertaking on-call responsibility, managing crisis with the broader team and communicating progress and challenges
The Profile
- You have at least 3 years experience in a SRE / Systems admin related role with a background in software development
- Expertise in Linux systems administration is mandatory
- You have experience with Multi-Cloud Computing (AWS, GCP, Azure, etc.) and building tools to automate system maintenance tasks.
- You have strong understanding of server automation systems (Chef, Puppet, Ansible, Terraform), monitoring tools and ability to define metrics to detect anomalies
- You possess hands-on Kubernetes or Docker experience, including deployment tools (spinnaker, istio)
- Scripting using any language (GO, NodeJs, Bash, python, etc.) is required
- You have demonstrated experience with datadog, stack driver, cloudwatch, splunk, elk or other log processing & alerting systems
- You have experience in cloud-based networking (HaProxy, WAF, ELB, ALB, distributed multi-cloud VPC)
- Previous professional experience writing in Golang, Java, Scala, C or C++ is a plus
- You ideally have an understanding of various security standards, protocols and implementation details
- Prior experience with Akamai and management of a distributed Kafka cluster is advantageous
- You are flexible and pragmatic with ability to juggle multiple priorities
- You are highly analytical and are able to identify problem components based on data points.
Ref: 39790603
Company Overview |
---|
![]() Snaphunt Pte Ltd |