✨ About The Role
- The Lead Site Reliability Engineer will be responsible for leading and scaling the Site Reliability Engineering team.
- This role involves designing, building, and deploying Sporttrade’s cloud and on-premise infrastructure.
- The candidate will provide primary operational support and engineering for various Sporttrade services.
- Leading incident response, root cause analysis, and post-mortem reviews to improve system reliability is a key duty.
- The position requires designing, deploying, and maintaining infrastructure to meet scalability, performance, and security requirements.
- The candidate will develop and maintain automation scripts and tools for deployment, monitoring, and incident management.
- The role includes owning the team management and structure of the Site Reliability Engineering team.
- The Lead Site Reliability Engineer will also be responsible for measuring and communicating key metrics for each environment at Sporttrade.
⚡ Requirements
- The ideal candidate will have excellent working knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible.
- Strong experience with cloud providers, particularly AWS and GCP, is essential for this role.
- Hands-on experience with Kubernetes is preferred, with GKE experience being a bonus.
- The candidate should possess extensive experience with CI/CD tools like Jenkins or GitHub Actions.
- Proficiency in Python or Bash scripting for automation and system management is required.
- Strong problem-solving skills and the ability to thrive in a fast-paced environment are crucial for success.
- Excellent communication and leadership skills are necessary to work collaboratively with cross-functional teams.