Production Support / Service Reliability Engineer (Banking)

SMART INFORMATION MANAGEMENT SYSTEMS PRIVATE LIMITED
2 days ago
Posted date2 days ago
N/A
Minimum levelN/A
ProductionJob category
ProductionExecutive Summary
Smart IMS Inc provides Digital technology & Cloud transformation services, Application & Infrastructure Management Services, Unified Communications, and Insurance implementation services to customers across the Americas, Europe, Middle East, and Asia-Pacific regions. As the trusted technology and business partner of leading MNCs, including Global Investment Banks, Smart IMS is also a Microsoft Gold Certified Partner, Oracle Platinum Partner and AWS MSP Partner.
We are seeking a seasoned Production Support / Service Reliability Engineer to support the ongoing stability and resilience of key production systems used by multiple business functions. This position is responsible for maintaining reliable application operations and managing production incidents, working closely with vendors.
The role is operationally hands-on and requires close engagement with business users. The focus is on minimizing disruptions, responding effectively to live issues, and continuously improving system reliability.
Key Responsibilities:
Application / Production Support & Service Reliability
Incident & Issue Management
Operational Readiness & Controls
Vendor & Third-Party Coordination
Stability & Continuous Improvement
Business & Stakeholder Communication
Required Skills:
Preferred Skills / Experience:
Smart IMS Inc provides Digital technology & Cloud transformation services, Application & Infrastructure Management Services, Unified Communications, and Insurance implementation services to customers across the Americas, Europe, Middle East, and Asia-Pacific regions. As the trusted technology and business partner of leading MNCs, including Global Investment Banks, Smart IMS is also a Microsoft Gold Certified Partner, Oracle Platinum Partner and AWS MSP Partner.
We are seeking a seasoned Production Support / Service Reliability Engineer to support the ongoing stability and resilience of key production systems used by multiple business functions. This position is responsible for maintaining reliable application operations and managing production incidents, working closely with vendors.
The role is operationally hands-on and requires close engagement with business users. The focus is on minimizing disruptions, responding effectively to live issues, and continuously improving system reliability.
Key Responsibilities:
Application / Production Support & Service Reliability
- Provide ongoing production support for mission-critical applications.
- Ensure high availability and performance of systems, particularly during market and business hours.
- Serve as the main escalation point for live production issues affecting users.
- Arrange and manage support coverage, including extended or after-hours support when necessary.
Incident & Issue Management
- Take ownership of major production incidents from detection through resolution.
- Deliver timely and clear updates to stakeholders during incidents.
- Perform post-incident reviews, identify root causes, and drive long-term corrective actions.
- Monitor recurring problems and partner with technology teams to reduce repeat occurrences.
Operational Readiness & Controls
- Coordinate daily system checks and readiness activities before market opening.
- Ensure systems are prepared for releases, business events, and operational changes.
- Participate in release readiness and go/no-go decision-making processes.
Vendor & Third-Party Coordination
- Manage production issues involving external vendors and SaaS providers.
- Monitor vendor response times, SLAs, and service quality.
- Escalate gaps in monitoring, alerting, or communication and ensure follow-up actions are completed.
Stability & Continuous Improvement
- Work closely with engineering and delivery teams to protect production stability during changes.
- Recommend enhancements to monitoring, alerting, and support workflows.
- Contribute to the development and maintenance of runbooks, escalation procedures, and operational documentation.
Business & Stakeholder Communication
- Engage regularly with business users, operations teams, compliance, and management.
- Provide concise incident summaries and operational updates to senior stakeholders.
- Maintain a composed and professional presence during critical and high-impact situations.
Required Skills:
- Minimum 4-6 years of experience in production support, IT operations, or incident management roles.
- Strong background supporting business-critical systems within financial services or regulated environments.
- Demonstrated experience handling major incidents and operating under pressure.
- Hands-on experience working with external vendors and SaaS service providers.
- Excellent verbal and written communication skills, including interaction with senior stakeholders.
- Solid understanding of ITSM / ITIL practices (certification is an advantage but not mandatory).
- Willingness to support operations during public holidays and provide occasional overnight or off-hours coverage.
Preferred Skills / Experience:
- Exposure to securities trading, brokerage platforms, or market-facing systems.
- Experience in service reliability, problem management, or change management functions.
- Awareness of regulatory, audit, and compliance requirements in financial institutions.
- Familiarity with monitoring solutions, incident management tools, and operational runbooks.
JOB SUMMARY
Production Support / Service Reliability Engineer (Banking)

SMART INFORMATION MANAGEMENT SYSTEMS PRIVATE LIMITED
Singapore
2 days ago
N/A
Full-time
Production Support / Service Reliability Engineer (Banking)