View All Jobs 112985

Senior Cloud Infrastructure Engineer (SRE)

Build automated incident response systems for critical government services
Singapore
Senior
1 month ago
Assurity Trusted Solutions

Assurity Trusted Solutions

A Singapore-based company providing digital security services such as secure digital identities and authentication for individuals and businesses.

34 Similar Jobs at Assurity Trusted Solutions

Digital Resiliency Engineering Role

In Digital Resiliency Engineering (DRE), we combine software and systems engineering to build and operate large-scale and distributed systems designed and/or built by the Singapore Government. We ensure Government services are reliable, meet expected performance, and satisfy customer needs.

If you are someone with a strong DevOps, Infrastructure engineering, and/or SRE background, have experience operating mission critical production technology infrastructure at scale, and are looking for opportunities to work with a team of practitioners and leading industry experts, we welcome you to join us.

In this role, you will build central services for observability and automation of infrastructure services. You will be part of a rotation with other engineers in providing rapid response to major incidents impacting critical Government Services. You will provide technical leadership for the team and work closely with technical leads to operate highly available solutions. You will also provide guidance to other team members on managing availability and performance of mission critical services, building automation and monitoring solutions to prevent problem recurrence, and building automated responses for non-exceptional service conditions.

You will also manage execution of project priorities, deadlines, and deliverables. You will also lead designs of major components, systems, and features to improve availability, scalability, latency, and efficiency of services design and built by the Government.

Key Responsibilities:

  • Build Service Level Indicators (SLI), Service Level Objective (SLO), Error Budgets, and Post-mortem Incident processes.
  • As part of an on-call roster, ensure reliability and performance of critical Government Services. Provide operational support and engineering for large-scale and distributed systems to drive incidents resolution effectively.
  • Gather and analyze metrics and logs from Operating Systems and/or applications for capacity planning, performance tuning, and fault isolation.
  • Build automation to manage services, infrastructure, and/or applications.
  • Improve reliability and quality of services using proactive monitoring.
  • Measure and optimize system performance, with continuous improvement and pushing SRE practice forward.
  • Build SRE playbook for the Whole-of-Government to leverage as reference for SRE.
  • Identify potential and emerging technologies relevant to innovation for the Government.
  • Work in a cross-functional service team consisting of software engineers, infrastructure engineers, DevOps, and other specialists.
+ Show Original Job Post
























Senior Cloud Infrastructure Engineer (SRE)
Singapore
Engineering
About Assurity Trusted Solutions
A Singapore-based company providing digital security services such as secure digital identities and authentication for individuals and businesses.