Service Reliability Engineer, G&A Solutions Engineering

Do you have a passion for ensuring the reliability, scalability, and performance of critical services? Are you a highly motivated and expert engineer with a strong understanding of Site Reliability Engineering (SRE) principles and a desire to automate and improve processes? Join Apple's General and Administrative (G&A) Solutions Engineering team as a Service Reliability Engineer and play a vital role in supporting our global, mission-critical production systems.

You'll be at the forefront of maintaining the health, stability, and efficiency of our services, working with a diverse range of technologies and platforms. You will collaborate with Engineers, Data Engineers, DBAs, and network specialists to proactively identify and resolve potential issues, automate repetitive tasks, and drive continuous improvement initiatives. Your expertise will directly impact the reliability of our systems, enabling Apple to deliver innovative products and services to our customers.

Responsibilities

Proactively monitor service performance, identify potential bottlenecks, and implement solutions to optimize efficiency and resilience
Lead incident response efforts, driving rapid resolution and conducting thorough root cause analysis (RCA)
Develop and implement automation strategies to streamline operational tasks, improve service resilience, and reduce manual intervention
Apply SRE principles to maintain highly reliable and scalable service infrastructure
Collaborate closely with development teams to ensure that new services are designed for operational perfection, incorporating best practices for monitoring, alerting, and scalability
Contribute to the creation and maintenance of comprehensive documentation, including run-books, service level objectives (SLOs)
Participate in on-call rotations, providing 24/7 support for critical services and responding to incidents with a sense of urgency
Find opportunities for process improvement and drive initiatives to enhance the efficiency and effectiveness of the service reliability team
Champion a culture of continuous learning and knowledge sharing within the team
Define and supervise key service level indicators (SLIs) to measure and improve service reliability

Minimum Qualifications

4+ years of experience in a Site Reliability Engineering, DevOps, or related role, supporting large-scale, enterprise-level services
Strong proficiency in at least one programming language (e.g., Python, Java, Go) and scripting languages (e.g., Bash, PowerShell)
Experience with cloud platforms (e.g., AWS, Azure, GCP) and cloud-native technologies (e.g., Kubernetes, Docker)
Hands-on experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Splunk, Datadog)
Bachelor's degree in Computer Science or work related equivalent experience

Preferred Qualifications

Familiarity with CI/CD pipelines and DevOps practices
Experience with database technologies (e.g., MySQL, PostgreSQL, NoSQL databases)
Knowledge of ITIL frameworks and incident management processes
Experience with vibe coding
Understanding of Linux/Unix system administration
Experience with configuration management tools (Ansible, Chef, Puppet)

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.

Apple accepts applications to this posting on an ongoing basis.

Suggest a correction

Service Reliability Engineer, G&A Solutions Engineering

Apple

Free Jobs Digest

NoDegree