Snaphunt Pte Ltd

Sr. Site Reliability Engineer

Snaphunt Pte Ltd| Date Posted: 23-Feb-2021
Save Job
Job Nature:
Position Level:
Entry Level, Experienced
ITE/ NITEC/ Higher NITEC, Diploma, Bachelor's / Honours

Job Description

  • Opportunity to work alongside & learn from international teammates
  • Great work environment and positive culture
  • Attractive perks and benefits


Our client is an app monetisation and advertising platform with multiple offices around the world. With innovative tools and solutions, they are a reliable partner of businesses seeking to accelerate growth, engagement and returns.


The Job

In this role, your primary focus is to improve the long-term health of the system and be responsible for :

  • Maintaining the reliability of the platform & processes as well as automating re-occurring tasks
  • Analysing systems based on data points to identify workloads critical to the business
  • Collaborating with engineering and product teams to ensure success of system operation
  • Monitoring system behavior to detect anomalies and resolve them in a timely manner
  • Supporting the stack in the event of a failure
  • Undertaking on-call responsibility, managing crisis with the broader team and communicating progress and challenges


The Profile

  • You have at least 3 years experience in a SRE / Systems admin related role with a background in software development
  • Expertise in Linux systems administration is mandatory
  • You have experience with Multi-Cloud Computing (AWS, GCP, Azure, etc.) and building tools to automate system maintenance tasks.
  • You have strong understanding of server automation systems (Chef, Puppet, Ansible, Terraform), monitoring tools and ability to define metrics to detect anomalies
  • You possess hands-on Kubernetes or Docker experience, including deployment tools (spinnaker, istio)
  • Scripting using any language (GO, NodeJs, Bash, python, etc.) is required
  • You have demonstrated experience with datadog, stack driver, cloudwatch, splunk, elk or other log processing & alerting systems
  • You have experience in cloud-based networking (HaProxy, WAF, ELB, ALB, distributed multi-cloud VPC)
  • Previous professional experience writing in Golang, Java, Scala, C or C++ is a plus
  • You ideally have an understanding of various security standards, protocols and implementation details
  • Prior experience with Akamai and management of a distributed Kafka cluster is advantageous
  • You are flexible and pragmatic with ability to juggle multiple priorities
  • You are highly analytical and are able to identify problem components based on data points.


Ref: 39790603

Company Overview
Snaphunt Pte Ltd