✨ About The Role
- Design and implement solutions to ensure the scalability of infrastructure to meet rapidly increasing demands
- Collaborate with development teams to enhance the reliability of systems and implement monitoring systems for proactive issue identification
- Develop and maintain service level objectives (SLOs) and service level indicators (SLIs) to measure and ensure system reliability
- Implement fault-tolerant design patterns, build automation tools, and participate in an on-call rotation to respond to critical incidents
- Work closely with cross-functional teams to bring new features and research capabilities to users while maintaining system stability and performance
âš¡ Requirements
- Experienced reliability engineer with a track record of accelerating engineering reliability in a fast-paced, rapidly scaling company
- Proficient in cloud infrastructure, programming/scripting languages, and containerization technologies like Kubernetes
- Skilled in collaborating with cross-functional teams to ensure reliability and scalability in the design and development of new features and services
- Strong problem-solving and troubleshooting skills with experience in implementing fault-tolerant and resilient design patterns
- Committed to creating a diverse, equitable, and inclusive culture while fostering radical candor and challenging group think