View All Jobs 126878

Lead AI Engineer

Lead development of production-grade AI agents and the agent flywheel at Salesforce
Mexico City, Mexico
Senior
22 hours agoBe an early applicant
Salesforce

Salesforce

Provides real-time collaborative text editing in the browser, enabling multiple users to simultaneously edit and share documents instantly.

Lead AI Engineer (Mexico City) Data Solutions Org

Hybrid

We are looking for a Lead AI Engineer to drive the development of next-generation AI and ML systems at Salesforce. This role owns the design and evolution of intelligent decisioning systems and expands into building a broader agent flywheel (a system of self-improving feedback loops that continuously evaluate, optimize, and evolve agent performance). This role sits on the applied side but requires strong data and systems engineering depth — you will build not just models and agents, but the data pipelines, evaluation loops, and lightweight system scaffolding that allow them to continuously improve in production. You will build production-grade ML models, embed them into agent workflows, and define how agents learn from real-world outcomes. This is a hands-on, high-impact role focused on shipping systems that directly influence agent performance, efficiency, revenue, and customer experience.

What You'll Do

1) Build the Agent Flywheel

  • Design and implement feedback loops that enable agents and ML models to self-improve over time
  • Develop systems for:
    • Outcome tracking (e.g., engagement, conversions, resolution quality)
    • Agent evaluation (LLM + deterministic + human-in-the-loop signals)
    • Iterative optimization (prompting, policies, model selection, fine-tuning)
  • Build pipelines that collect and structure agent traces (inputs, tool usage, intermediate steps, outputs) into high-quality training and evaluation datasets
  • Close the loop from production signals → evaluation → model/prompt improvements

2) Develop Production ML & Agent Systems

  • Build and deploy application-specific ML models (classification, ranking, forecasting, recommendation, etc.)
  • Design and implement AI agents that combine:
    • LLM reasoning
    • Tool/API usage
    • ML-based decisioning layers
  • Implement reusable agent patterns (multi-step reasoning, tool orchestration, structured outputs) within application workflows
  • Integrate ML and agent capabilities into decisioning systems that drive business outcomes

3) Data & Pipeline Engineering

  • Design and build scalable data pipelines (batch and near real-time) that power training, evaluation, and inference workflows
  • Develop pipelines that transform raw interaction data into features, labels, and evaluation datasets
  • Partner model pipelines with data pipelines to enable continuous retraining and evaluation loops
  • Ensure data quality, consistency, and availability across systems
  • Work with large-scale structured and unstructured data to support both ML and LLM systems

4) Evaluation, Experimentation & Optimization

  • Build offline and online evaluation frameworks for agent and ML model performance
  • Develop evaluation datasets, golden traces, and regression-style test sets for agent behavior
  • Design and run A/B experiments to measure impact on business outcomes
  • Define and monitor key metrics (quality, containment, revenue impact, latency, etc.)
  • Use production traces and evaluation signals to drive continuous optimization (prompting, model selection, feature improvements, fine-tuning)

5) Architecture & Applied Systems Design

  • Develop hybrid systems that blend:
    • Deterministic logic
    • Model-based scoring
    • LLM-driven generation
  • Collaborate with platform teams to leverage shared infrastructure (model serving, evaluation tooling, observability), while building application-specific layers on top
  • Design systems that scale with increasing agent complexity and data volume

6) Platform & API Development

  • Build scalable Python services and APIs powering agent workflows
  • Contribute to shared infrastructure for model serving, evaluation, and experimentation
  • Ensure reliability, observability, and performance of deployed systems

Qualifications

Core Requirements

  • 6+ years of experience in AI/ML engineering, applied data science, or closely related roles
  • Strong hands-on experience in Python for production systems
  • Proven track record building and deploying production-grade ML models
  • Strong experience with data pipeline development (ETL/ELT, batch or streaming)
  • Experience designing and building AI agents or agent-like systems
  • Strong experience with API development and backend services
  • Experience with ML lifecycle tooling (training, evaluation, deployment, monitoring)

Data & Systems Expertise

  • Experience building reliable data pipelines that support ML or AI systems in production
  • Familiarity with:
    • Data processing frameworks (e.g., Spark or equivalent)
    • Data orchestration tools (e.g., Airflow, Dagster, etc.)
    • Data warehousing solutions (e.g., Snowflake, BigQuery, etc.)
  • Understanding of data quality, lineage, and reproducibility in ML systems

Agent & LLM Experience

  • Experience building or working with LLM-powered systems (prompting, orchestration, evaluation)
  • Familiarity with agent frameworks and tool-using agents
  • Experience working with agent traces, evaluation datasets, or iterative improvement loops is strongly preferred

Modeling & Systems Thinking

  • Strong understanding of:
    • Supervised learning (classification, regression, ranking)
    • Evaluation methodologies (offline + online)
    • Experimentation (A/B testing, causal inference basics)
  • Ability to design systems that combine:
    • ML models
    • LLMs
    • Business logic

    Engineering & Production Skills

    • Experience deploying models/services in production environments
    • Familiarity with:
      • Model serving architectures
      • Data pipelines
      • Monitoring and observability
    • Ability to write clean, scalable, maintainable code

    Preferred Qualifications

    • Experience building model-driven agent improvement systems (e.g., scoring, gating, auto-optimization)
    • Experience with reinforcement learning, bandits, or iterative optimization systems
    • Exposure to agent evaluation tools (e.g., LangSmith, Braintrust, or similar concepts)
    • Experience with large-scale experimentation platforms
    • Familiarity with enterprise SaaS or CRM domains

    What Success Looks Like

    • Agents and production-grade ML models measurably improve over time via automated feedback loops
    • Well-structured data and evaluation pipelines continuously feeding the agent flywheel
    • Clear lift in key business metrics (e.g., engagement, conversion, revenue impact)
    • Robust evaluation systems that enable rapid iteration and safe deployment
+ Show Original Job Post
























Lead AI Engineer
Mexico City, Mexico
Engineering
About Salesforce
Provides real-time collaborative text editing in the browser, enabling multiple users to simultaneously edit and share documents instantly.