Site Reliability Engineer — Human Engineering

At Apple, new ideas have a way of becoming phenomenal products, services, and customer experiences very quickly. Imagine what you could do here. Bring passion and dedication to your job and there's no telling what you could accomplish. We are a team of software engineers developing web-based tools and native applications for Apple teams. Our work empowers Apple engineers and researchers to build the products that inspire and delight millions every day. We're looking for a Site Reliability Engineer who thinks like a systems engineer first and an operator second. You won't just keep things running — you'll shape how our platform evolves. Our team operates 50+ services across Kubernetes and AWS, handles sensitive health and research data, and is ramping up many architectural shifts: new service-to-service auth patterns, event-driven pipelines, and a move from on-prem to cloud-native infrastructure. We need someone who gets excited about that kind of work, can reason about distributed systems at the design level, and is a strong enough communicator to bring the rest of the team along.

The Human Engineering Software team builds tools used across Apple for user studies, research participant management, health data collection, and privacy-preserving analytics. Our infrastructure spans Django backends, Kubernetes clusters (self-hosted and AWS), PostgreSQL, Redis, Kafka, Elasticsearch and a growing set of internal service integrations. This role is engineering-forward SRE. You'll spend as much time designing systems as operating them. You'll work closely with our full-stack engineers to improve how services communicate, how we observe production behavior, and how we ship changes safely. You'll have a seat at the architecture table — we want you proposing solutions, not just implementing them.

Responsibilities

Platform & Reliability Engineering - Own the reliability of our Kubernetes-hosted services across AWS and self-hosted clusters: deployments, scaling, capacity planning, certificate management, and secrets rotation. Design and implement SLO-driven observability: define meaningful SLIs, build dashboards that answer "is the system healthy?" not just "is the pod running?" Drive incident response and blameless postmortems
Distributed Systems & Architecture - Partner with the architecture team on system design: service-to-service authentication (OIDC, gateway auth), event-driven messaging (Kafka), API gateway patterns. Design the infrastructure layer to make architecture proposals real in production. Evaluate and recommend new tools, patterns, and platforms and write code when it's the right tool, whether that's a deployment operator, a health check service, or a data pipeline component. This isn't a YAML-only role
Engineering Enablement - Make the team efficient; own CI/CD pipelines and GitOps practices, owning tests to verify or production tools are functioning correctly, build self-service automation, evolve our observability and security posture, and communicate infrastructure decisions clearly across technical and non-technical stakeholders

Minimum Qualifications

BS in Computer Science, Engineering, or equivalent practical experience, with 3+ years of experience in distributed systems
Deep experience with Kubernetes in production — cluster operations, networking, storage, troubleshooting
Strong proficiency designing and operating services in AWS (EC2, EKS, RDS, S3, IAM, VPC)
Hands-on infrastructure-as-code experience (Terraform, Helm, or equivalent)
Proficiency in at least one backend language (Python, Go, or similar) — you can write production services, not just scripts
Experience with CI/CD pipeline design and GitOps workflows
Strong understanding of networking fundamentals: DNS, load balancing, TLS, firewall rules, service discovery
Excellent communication skills. You can explain a complex system to a room of engineers who didn't build it
Experience building internal automation or self-service tooling (Slack bots, CLI tools, workflow orchestration) that reduced manual operational work

Preferred Qualifications

BS in Computer Science, Engineering, or equivalent practical experience, with 5+ years of experience in distributed systems
Experience with event-driven architectures (Kafka, RabbitMQ, or similar messaging systems)
Experience with service mesh or API gateway patterns (Istio, Envoy, Kong, or similar)
Familiarity with Django/Python web applications and their operational characteristics (Celery, Gunicorn, PostgreSQL)
Experience with observability tooling beyond basic monitoring: distributed tracing, SLO frameworks, structured logging
Background working with sensitive data (health data, PII) and associated compliance requirements
Experience leading incident response and building on-call culture
Contributions to internal or open-source infrastructure tooling

Pay & Benefits

At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $181,100 and $272,100, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits. Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant. Apple accepts applications to this posting on an ongoing basis.

Suggest a correction

Site Reliability Engineer — Human Engineering

Apple

Free Jobs Digest

NoDegree