 
                                                
                                            Our Staff Backend Engineers make a real impact on the safety and ROI of large language models and agentic applications across different verticals and domains. You will work on the cutting edge of envisioning and building new types of tools and algorithms to monitor, explain, and improve such applications and in turn empower our customers.
Our engineering team is a dynamic group of builders and thinkers dedicated to solving some of the most cutting-edge challenges in AI safety and reliability. Working on exciting and an expansive range of topics, from the responsible deployment of machine learning models, large language models (LLMs), to complex agentic applications. Our projects are inherently cross-disciplinary, requiring expertise in systems engineering, product engineering, and data science to build robust, scalable solutions. We thrive in a collaborative environment where continuous learning is at the forefront, ensuring every team member stays on their toes with the latest advancements in AI. Joining our team means you'll have the opportunity to make a tangible impact on how AI evolves for the benefit of humanity.
Design and build core services and components of a world-class cloud platform to help enterprises develop, monitor and improve their full suite of AI based applications (covering predictive models, LLMs, GenAI models and agentic applications)
Lead the design and implementation of distributed systems and microservices that compute, persist, and expose new ML + agentic observability metrics (e.g., response relevancy, hallucination scores) from raw trace data
Design enterprise-grade, scalable data infrastructure, services and APIs to support enterprise scale workloads and meet compliance needs and SLAs
Spearhead the development of new types of metrics and evaluation capabilities to satisfy evolving customer needs. Take part in conversations with customers around discovery and support
Define and evolve the operational maturity (reliability, latency, SLOs, observability) of core services, establish best practices and champion improvements to internal CI/CD processes, testing frameworks, error handling, efficiency and resiliency
Team & Culture Building: you will take an active role in building a world-class engineering team and actively participate in the talent acquisition process through interviewing, candidate evaluation and coaching
Masters or Bachelors degree in Computer Science or related field, combined with 7+ years of industry experience, with demonstrated solid foundation in software development.
Deep proficiency with Python and a strong command of essential backend technologies like Postgres, Redis, Kafka, RabbitMQ, Ray. This includes the ability to design, build, and debug complex, large-scale systems.
Experience with deploying and working with ML/LLM models in production. The candidate should be comfortable with modern LLM frameworks (e.g., Langchain, HuggingFace, vLLM) and evaluation frameworks (e.g., Ragas, MLFlow) to ensure model performance and reliability.
Adaptability & Ownership: proven ability to thrive in ambiguity and a fast-paced environment. We need a self-motivated initiator who can take ownership of projects with a high degree of autonomy, confidently filling in the gaps when the full picture isn't available.
System Design & Optimization: A strong grasp of distributed systems and the capacity to troubleshoot production issues. A nice to have would be experience with cloud infrastructure (AWS/GCP, Kubernetes) and specialized databases (Clickhouse/Druid), indicating a deeper understanding of system architecture and performance optimization.
Technical Leadership & Collaboration: Demonstrated ability to plan, execute, and deliver projects by effectively breaking down complex problems into manageable tasks, and guiding a small team of engineers. Must be adept at cross-functional collaboration across a geographically distributed team, working closely with product managers, designers, frontend developers, and data scientists to ensure alignment and successful project outcomes
Coaching & Mentorship: you should be an excellent collaborator and a mentor to other team members, raising the technical bar for the entire team and regularly engage in code and design reviews.
Ability to work in our Palo Alto office 3 days a week
$190,000 - $300,000 + equity + benefits
The posted range represents the expected salary range for this job and does not include any other potential components of the compensation package and perks previously outlined. Ultimately, in determining pay, we'll consider your experience, leveling, location, and other job-related factors.
Fiddler is proud to be an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. If you require special accommodations in order to complete the interviews or perform job duties, please inform the recruiter at the beginning of the process.
Beware of job scam fraud. Our recruiters use @fiddler.ai email addresses exclusively. In the US, we do not conduct interviews via text or instant message, or ask for sensitive personal information such as bank account or social security numbers.