View All Jobs 170826

AIML - Sr Software Engineer, Machine Learning Platform Technologies

Build self-managing, agentic infrastructure platforms for large-scale ML workloads
Cupertino, California, United States
Senior
$212,000 – 318,400 USD / year
17 hours agoBe an early applicant
Apple

Apple

A multinational technology company known for its consumer electronics, software, and online services, including the iPhone, iPad, and Mac computers.

AIML - Sr Software Engineer, Machine Learning Platform Technologies

Are you an open-source contributor passionate about building the next generation of cloud-native ML infrastructure? We're seeking a hands-on technical leader with deep expertise in Kubernetes, Crossplane, Golang/Rust, and agentic workflows to design and scale the platforms that power Apple's Siri, Search, and AI/ML ecosystems. If you've contributed to CNCF projects such as Crossplane, ArgoCD, or Kubernetes, and you're driven to build infrastructure for ML training and inference—including optimizing for performance, cost, and automation—this role is for you. You'll architect at Apple scale, developing intelligent, declarative, and self-managing infrastructure that enables billions of seamless user experiences.

Description

Our MLPT Cloud Infrastructure Team within Apple's AI/ML organization designs, builds, and scales the foundational systems that power Siri, Search, and next-generation ML workloads. We're reimagining how infrastructure is managed—through agentic, event-driven workflows, Crossplane compositions, and self-healing control planes—to deliver Model Context Protocol (MCP)–based infrastructure servers that integrate seamlessly with ML and data workflows. You'll work closely with AI/ML engineers, SREs, and platform teams to deliver infrastructure that is automated, observable, and efficient across Apple-scale hybrid and multi-cloud environments.

Responsibilities

Architect and develop cloud-native, agentic infrastructure platforms supporting ML training, inference, and large-scale distributed systems. Lead and mentor engineers building Crossplane-based control planes, Kubernetes operators, and ArgoCD-driven GitOps automation. Design, build, and optimize Model Context Protocol (MCP) servers that manage and contextualize infrastructure and application state across environments. Contribute to and upstream improvements in open-source CNCF projects, representing Apple in the cloud-native community. Implement observability, governance, and automation frameworks to ensure performance, reliability, and compliance. Collaborate with AI/ML and infrastructure teams to integrate agentic orchestration workflows for self-service provisioning, ML pipeline management, and dynamic scaling. Drive best practices for GitOps, IaC, and Kubernetes cluster lifecycle automation at global scale. Ensure systems are resilient, secure, and optimized for cost and performance across on-prem and multi-cloud environments.

Minimum Qualifications

BS/MS in Computer Science or related field (or equivalent practical experience). 5+ years of experience in distributed systems or cloud infrastructure engineering. Strong programming experience in Golang and/or Rust; expertise in building controllers, operators, or automation systems. Deep understanding of Kubernetes internals, controller-runtime, and Crossplane composition frameworks. Experience with ArgoCD, Helm, and Infrastructure-as-Code (Terraform, Pulumi, or Crossplane). Hands-on experience with GitOps, declarative configuration, and reconciliation-driven workflows. Proven ability to design and operate infrastructure for ML training and inference, including performance tuning and GPU optimization. Experience leading technical teams, driving architecture decisions, and mentoring engineers. Strong grounding in cloud cost efficiency, performance profiling, and system-level debugging.

Preferred Qualifications

9+ years in cloud infrastructure, SRE, or distributed systems roles. Active contributor to CNCF open-source projects (e.g., Kubernetes, Crossplane, ArgoCD, Envoy, Prometheus). Deep expertise in Kubernetes API machinery, custom resources (CRDs), and control plane development. Experience with Model Context Protocol (MCP)–based systems or contextual orchestration servers. Familiarity with AIOps or agentic AI workflows in production environments. Strong understanding of observability, telemetry, and distributed tracing (OpenTelemetry, Prometheus, Grafana). Proven experience building ML infrastructure platforms (training clusters, inference services, model registries). Excellent communication, technical writing, and cross-functional leadership skills.

Pay & Benefits

At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $212,000 and $318,400, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation.

+ Show Original Job Post
























AIML - Sr Software Engineer, Machine Learning Platform Technologies
Cupertino, California, United States
$212,000 – 318,400 USD / year
Engineering
About Apple
A multinational technology company known for its consumer electronics, software, and online services, including the iPhone, iPad, and Mac computers.