View All Jobs 170255

Principal Software Engineer – AI Systems

Develop fault-tolerant distributed AI systems supporting multi-modal data processing
San Francisco Bay Area
Expert
$143,000 – 286,000 USD / year
18 hours agoBe an early applicant
Walmart

Walmart

A multinational retail corporation operating a chain of hypermarkets, discount department stores, and grocery stores.

AI Systems Engineer

Design and implement large-scale, production-grade AI systems that integrate LLMs and Generative AI into real-world applications.

Build frameworks that support Retrieval-Augmented Generation (RAG), agentic workflows, and multi-step reasoning at scale.

Ensure models and agents are production-ready with strong observability, monitoring, and performance optimization.

Architect distributed, fault-tolerant systems capable of supporting high-throughput AI workloads.

Lead the design of modular, extensible, and reusable components to accelerate AI adoption across teams.

Build MVPs quickly, validate assumptions, and iterate toward scalable long-term solutions.

Partner with product and platform teams to integrate AI into customer-facing and enterprise-grade applications.

Define and enforce standards for APIs, services, and infrastructure that enable seamless AI adoption.

Balance functional requirements with non-functional goals such as reliability, latency, and security.

Drive technical strategy for AI initiatives and guide teams in best practices for AI-driven software development.

Mentor engineers across software and AI domains to elevate overall technical expertise.

Contribute to thought leadership in AI engineering through internal frameworks, design patterns, and reusable components.

12+ years of experience in software engineering (backend, distributed systems, large-scale platforms), with 2+ years applying Generative AI/LLMs in production.

Proven expertise in distributed computing, cloud-native architectures (GCP, Azure, or AWS), and systems that prioritize scalability and fault tolerance.

Strong coding skills in Python (preferred) and at least one system-level language (Java, Go, or C++).

Experience with ML/AI frameworks (PyTorch, TensorFlow, Hugging Face) as a plus, but applied in the context of building systems, not just training models.

Deep knowledge of RAG pipelines, vector databases, and real-time data integration.

Familiarity with resilience engineering: disaster recovery, failover, monitoring, and high availability.

Exposure to multi-modal AI (text, image, video) and optimization techniques (quantization, distillation) is advantageous.

Strong grounding in system design, performance engineering, and design patterns.

Track record of delivering production systems with AI at scale, not just research or prototyping.

Moved model training/fine-tuning to secondary importance → framed as a plus, not the core.

Emphasized distributed systems, cloud, APIs, reliability, and software engineering fundamentals.

Framed role as "AI Systems Engineer" / "AI Engineer" instead of "ML Engineer."

Highlighted production integration and customer-facing impact, which appeals to senior software engineers.

Equal Opportunity Employer

Walmart, Inc. is an Equal Opportunity Employer – By Choice. We believe we are best equipped to help our associates, customers, and the communities we serve live better when we really know them. That means understanding, respecting, and valuing unique styles, experiences, identities, ideas, and opinions – while being inclusive of all people.

+ Show Original Job Post
























Principal Software Engineer – AI Systems
San Francisco Bay Area
$143,000 – 286,000 USD / year
Engineering
About Walmart
A multinational retail corporation operating a chain of hypermarkets, discount department stores, and grocery stores.