View All Jobs 2636

Site Reliability Engineer

Design and implement comprehensive monitoring solutions to enhance system reliability and performance.
Remote
Mid-Level
1 week ago

✨ About The Role

- The Site Reliability Engineer will be responsible for ensuring the reliability, scalability, and performance of Replit's infrastructure. - This role involves designing and implementing observability solutions to monitor system health and performance. - The engineer will drive automation and infrastructure as code using tools like Terraform and Ansible. - Establishing Service Level Objectives (SLOs) and Service Level Indicators (SLIs) in collaboration with product and engineering teams is a key responsibility. - The role includes leading incident management efforts and conducting post-mortems to improve future responses.

⚡ Requirements

- The ideal candidate will have at least 3 years of experience in Site Reliability Engineering or similar roles such as DevOps or Systems Engineering. - Strong programming skills in languages like Python or Go are essential for automating tasks and building resilient systems. - A deep understanding of distributed systems is crucial for ensuring the reliability and performance of the infrastructure. - Candidates should have experience with container orchestration platforms, particularly Kubernetes, and cloud-native technologies. - Strong incident management skills are necessary, with a proven track record of leading incident response efforts.
+ Show Original Job Post
























Site Reliability Engineer
Remote
Engineering
About Replit
Collaborative in-browser IDE