Lead AI Engineer (Mexico City) Data Solutions Org
Hybrid
We are looking for a Lead AI Engineer to drive the development of next-generation AI and ML systems at Salesforce. This role owns the design and evolution of intelligent decisioning systems and expands into building a broader agent flywheel (a system of self-improving feedback loops that continuously evaluate, optimize, and evolve agent performance). This role sits on the applied side but requires strong data and systems engineering depth — you will build not just models and agents, but the data pipelines, evaluation loops, and lightweight system scaffolding that allow them to continuously improve in production. You will build production-grade ML models, embed them into agent workflows, and define how agents learn from real-world outcomes. This is a hands-on, high-impact role focused on shipping systems that directly influence agent performance, efficiency, revenue, and customer experience.
What You'll Do
1) Build the Agent Flywheel
- Design and implement feedback loops that enable agents and ML models to self-improve over time
- Develop systems for:
- Outcome tracking (e.g., engagement, conversions, resolution quality)
- Agent evaluation (LLM + deterministic + human-in-the-loop signals)
- Iterative optimization (prompting, policies, model selection, fine-tuning)
- Build pipelines that collect and structure agent traces (inputs, tool usage, intermediate steps, outputs) into high-quality training and evaluation datasets
- Close the loop from production signals → evaluation → model/prompt improvements
2) Develop Production ML & Agent Systems
- Build and deploy application-specific ML models (classification, ranking, forecasting, recommendation, etc.)
- Design and implement AI agents that combine:
- LLM reasoning
- Tool/API usage
- ML-based decisioning layers
- Implement reusable agent patterns (multi-step reasoning, tool orchestration, structured outputs) within application workflows
- Integrate ML and agent capabilities into decisioning systems that drive business outcomes
3) Data & Pipeline Engineering
- Design and build scalable data pipelines (batch and near real-time) that power training, evaluation, and inference workflows
- Develop pipelines that transform raw interaction data into features, labels, and evaluation datasets
- Partner model pipelines with data pipelines to enable continuous retraining and evaluation loops
- Ensure data quality, consistency, and availability across systems
- Work with large-scale structured and unstructured data to support both ML and LLM systems
4) Evaluation, Experimentation & Optimization
- Build offline and online evaluation frameworks for agent and ML model performance
- Develop evaluation datasets, golden traces, and regression-style test sets for agent behavior
- Design and run A/B experiments to measure impact on business outcomes
- Define and monitor key metrics (quality, containment, revenue impact, latency, etc.)
- Use production traces and evaluation signals to drive continuous optimization (prompting, model selection, feature improvements, fine-tuning)
5) Architecture & Applied Systems Design
- Develop hybrid systems that blend:
- Deterministic logic
- Model-based scoring
- LLM-driven generation
- Collaborate with platform teams to leverage shared infrastructure (model serving, evaluation tooling, observability), while building application-specific layers on top
- Design systems that scale with increasing agent complexity and data volume
6) Platform & API Development
- Build scalable Python services and APIs powering agent workflows
- Contribute to shared infrastructure for model serving, evaluation, and experimentation
- Ensure reliability, observability, and performance of deployed systems
Qualifications
Core Requirements
- 6+ years of experience in AI/ML engineering, applied data science, or closely related roles
- Strong hands-on experience in Python for production systems
- Proven track record building and deploying production-grade ML models
- Strong experience with data pipeline development (ETL/ELT, batch or streaming)
- Experience designing and building AI agents or agent-like systems
- Strong experience with API development and backend services
- Experience with ML lifecycle tooling (training, evaluation, deployment, monitoring)
Data & Systems Expertise
- Experience building reliable data pipelines that support ML or AI systems in production
- Familiarity with:
- Data processing frameworks (e.g., Spark or equivalent)
- Data orchestration tools (e.g., Airflow, Dagster, etc.)
- Data warehousing solutions (e.g., Snowflake, BigQuery, etc.)
- Understanding of data quality, lineage, and reproducibility in ML systems
Agent & LLM Experience
- Experience building or working with LLM-powered systems (prompting, orchestration, evaluation)
- Familiarity with agent frameworks and tool-using agents
- Experience working with agent traces, evaluation datasets, or iterative improvement loops is strongly preferred
Modeling & Systems Thinking