View All Jobs 120210

Staff Generative AI Research Engineer, Multimodal, Agent Modeling - SIML

Own the development and deployment of multimodal LLMs and agentic AI capabilities at Apple
Cupertino, California, United States
$181,100 – 318,400 USD / year
23 hours agoBe an early applicant
Apple

Apple

Designs and sells consumer electronics, software, and digital services, including smartphones, computers, wearables, and media platforms.

Staff Generative AI Research Engineer, Multimodal, Agent Modeling - SIML

Cupertino, California, United States

Are you passionate about Generative AI? Are you interested in working on groundbreaking generative modeling technologies to enrich billions of people? We are driving multiple initiatives focused on advancing generative models, and we are seeking technical leaders experienced in training, adapting and deploying large-scale generative models. This role emphasizes multimodal understanding and generation, and the development of agentic systems that push the boundaries of what AI can achieve responsibly.

We are the Intelligence System Experience (ISE) team within Apple's software organization. The team operates at the intersection of multimodal machine learning and system experiences. It oversees a range of experiences such as System Experience (Springboard, Settings), Image Generation, Genmoji, Writing tools, Keyboards, Pencil & Paper, Generative Shortcuts - all powered by production scale ML workflows. Our multidisciplinary ML teams focus on a broad spectrum of areas, including Visual Generation Foundation Models, Multimodal Understanding, Visual Understanding of People, Text, Handwriting, and Scenes, Personalization, Knowledge Extraction, Conversation Analysis, Behavioral Modeling for Proactive Suggestions, and Privacy-Preserving Learning. These innovations form the foundation of the seamless, intelligent experiences our users enjoy every day.

We are looking for senior research engineers to architect and advance multimodal LLM and Agentic AI technologies, ensuring their safe and responsible deployment in the real world. An ideal candidate will have the ability to lead diverse cross functional efforts spanning ML modeling, prototyping, validation and privacy-preserving learning. A strong foundation in machine learning and generative AI, along with a proven ability to translate research innovations into production-grade systems, is essential. Industry experience in Vision-Language multimodal modeling, Reinforcement and Preference Learning, Multimodal Safety, and Agentic AI Safety & Security would be important needs.

Description

We are looking for a candidate with a proven track record in applied ML research. Responsibilities in the role will include training large scale-multimodal (2D/3D vision-language) models on distributed backends, deploying efficient neural architectures on device and private cloud compute, addressing emerging safety challenges to make the model/agents robust and aligned with human values. A key focus of the position is ensuring real-world quality, emphasizing model and agent safety, fairness, and robustness. You will collaborate closely with ML researchers, software engineers, and hardware and design teams across multiple disciplines. The core responsibilities include advancing the multimodal capabilities of large language models and strengthening agentic workflows. On the user experience front, the work will involve aligning image and video content to the space of LLMs for visual actions and multi-turn interactions, enabling rich, intuitive experiences powered by agentic AI systems.

Minimum Qualifications

M.S. or PhD in Electrical Engineering/Computer Science or a related field (mathematics, physics or computer engineering), with a focus on computer vision and/or machine learning or comparable professional experience.

Strong ML and Generative Modeling fundamentals

Experience using one or more of the following: Reinforcement Learning, Distillation, and/or Pre-training or Post-training of Multimodal-LLMs

Familiarity with distributed training

Proficiency in using ML toolkits, e.g., PyTorch

You're aware of the challenges associated to the transition of a prototype into a final product

Proven record of research innovation and demonstrated leadership in both applied research and development

Preferred Qualifications

Experience with building & deploying AI agents, LLMs for tool use, and Multimodal-LLMs

Pay & Benefits

At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $181,100 and $318,400, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation.

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics.

+ Show Original Job Post
























Staff Generative AI Research Engineer, Multimodal, Agent Modeling - SIML
Cupertino, California, United States
$181,100 – 318,400 USD / year
Engineering
About Apple
Designs and sells consumer electronics, software, and digital services, including smartphones, computers, wearables, and media platforms.