Sr. Engineering Manager, Ai Evaluation Platform

Join Apple Services Engineering to build the next generation of AI evaluation systems. We are seeking a hands-on Engineering Manager to architect high-availability services and internal tools that enable self-service evaluation at scale. You will partner with researchers to operationalize their innovations, transforming complex workflows into intuitive, developer-first platforms. We are looking for a leader who thrives in the ambiguity of new initiatives and is passionate about building scalable infrastructure.

You will build and lead the engineering team responsible for democratizing AI evaluation across the organization. Your focus will be on architecting the developer experience—designing the APIs, SDKs, and platform services that turn complex evaluation metrics into simple, self-service calls. You will work hand-in-hand with researchers to operationalize sophisticated measurement techniques, ensuring they scale reliably within our high-availability infrastructure. In this role, you will define the engineering standards for a new organization, establishing the code quality, automation, and testing rigor required to support the rapid evolution of Generative AI and Agentic systems.

Responsibilities

Team Building & Leadership: Hire, mentor, and grow a diverse, high-performing team of backend and platform engineers. Foster a culture of technical excellence and rapid delivery as you build this new team from the ground up.
Technical Strategy & Roadmap: Own the engineering roadmap for the core evaluation engine. Architect the APIs, SDKs, and distributed services that power our internal platform, enabling product teams to measure Generative AI performance autonomously.
Operationalizing Science: Partner closely with Applied Scientists to translate novel metrics, judge prompts, and scoring algorithms into scalable, production-grade services. Create frameworks to evaluate not just simple responses, but also multi-turn agent trajectories and tool usage.
System Integration: Serve as a technical bridge between the research organization and the broader engineering ecosystem, ensuring our tools integrate seamlessly with existing ML infrastructure and developer workflows.
Engineering Rigor: Establish the software development lifecycle (SDLC) for the team, defining standards for code quality, automated testing (CI/CD), and monitoring to ensure high availability and reliability.

Minimum Qualifications

5+ years of direct engineering management experience, with a proven track record of hiring, mentoring, and retaining high-performing engineers. You have successfully managed teams that ship production-grade software.
7+ years of hands-on software engineering experience with deep proficiency in the Python ecosystem (e.g., FastAPI, Pydantic, Pandas). You are capable of contributing to code reviews and architectural discussions on day one.
Customer Obsession & Product Thinking: Experience acting as a technical partner to internal customers. You can translate vague requirements from other teams into concrete engineering specifications and are comfortable prioritizing the roadmap in the absence of a dedicated Product Manager.
Demonstrated experience partnering with Data Scientists or Researchers: You have a history of taking experimental or "messy" code and refactoring it into reliable, scalable production systems.
Functional literacy in AI/ML concepts: You understand the fundamental lifecycle of machine learning (datasets, training vs. inference, evaluation metrics) and can discuss the engineering challenges involved in serving models.
Strong expertise in API Design & Internal Tools: You have architected APIs that other developers rely on, with a focus on versioning, backward compatibility, and developer experience.
Operational excellence background: You have practical experience establishing CI/CD pipelines, containerization (Docker/Kubernetes), and monitoring (Datadog/Prometheus).

Preferred Qualifications

Experience building MLOps & Platform Infrastructure: You have architected or managed teams that built the foundational infrastructure for AI, such as model registries, inference services, or feature stores (using tools like Kubernetes, Ray, or Kubeflow).
Deep familiarity with AI Evaluation Frameworks: You have used or contributed to modern evaluation tools like DeepEval, Ragas, TruLens, or LangSmith. You understand how to implement and scale model-based evaluation workflows.
Deep understanding of Generative AI & Agents: You understand the engineering challenges of relying on LLMs and Agents as software components—specifically managing token economics, handling rate limits, and evaluating non-deterministic, multi-step reasoning capabilities.
Builder Experience: You have thrived in startup-like environments or incubated new teams within larger orgs, navigating high ambiguity to define roadmaps where none existed.

At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $216,600 and $325,500, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation.

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics.

Apple accepts applications to this posting on an ongoing basis.

Suggest a correction

Sr. Engineering Manager, AI Evaluation Platform

Apple

Free Jobs Digest

NoDegree