View All Jobs 2615

Lead Site Reliability Engineer

Design and implement comprehensive monitoring solutions for Replit's global infrastructure.
Foster City, California, United StatesFoster City, California, United States
Senior
4 days ago

✨ About The Role

- The role involves ensuring the reliability, scalability, and performance of Replit's infrastructure that serves millions of developers worldwide. - Responsibilities include designing and implementing robust monitoring solutions and automating operational tasks to improve infrastructure reliability. - The candidate will work with product and engineering teams to define and implement Service Level Objectives (SLOs) and Service Level Indicators (SLIs). - Leading incident response efforts and developing runbooks for critical services will be key components of the job. - The position requires identifying and resolving performance bottlenecks and optimizing resource utilization across the infrastructure.

⚡ Requirements

- The ideal candidate will have over 5 years of experience in Site Reliability Engineering or similar roles such as DevOps or Systems Engineering. - A strong programming background in languages like Python or Go is essential for automating tasks and building resilient systems. - Candidates should possess a deep understanding of distributed systems and cloud infrastructure, particularly with container orchestration platforms like Kubernetes. - Strong incident management skills are necessary, with experience leading incident response efforts and conducting post-mortems. - A passion for continuous learning and staying current with industry best practices and new technologies is highly valued.
+ Show Original Job Post
























Lead Site Reliability Engineer
Foster City, California, United States
Engineering
About Replit
Collaborative in-browser IDE