View All Jobs 121763

Staff Operations Engineer (devops/mlops)

Own end-to-end MLOps platform development for scalable, reliable production ML at Adobe.
San Jose, California, United States
Senior
$159,200 – 301,600 USD / year
19 hours agoBe an early applicant
Adobe

Adobe

Provides creative, marketing, and document management software and cloud services for designing, publishing, and managing digital content.

Staff Software Development Engineer

We are seeking an experienced Staff Software Development Engineer with deep expertise in cloud infrastructure and a passion for building scalable, production-grade ML systems. As part of the Applied Research and Technology Services organization, you will play a meaningful role crafting the operational backbone for high-performance, reliable, and globally scaled machine learning services. In this role, you'll work closely with multi-functional collaborators. These include Adobe Research, Adobe AI Platforms, and product engineering teams. Together, you will architect solutions that speed up innovation and improve service resilience. You will also provide technical leadership and define, document, and enforce guidelines adopted across teams. You will own technical direction for core service infrastructure and MLOps, influence architectural decisions across multiple teams, and raise the operational maturity of the organization through standards, reusable platforms, and mentorship. You will evaluate and introduce new infrastructure, optimization, and agentic technologies with clear value and adoption plans. This position is ideal for someone who thrives at the intersection of DevOps, MLOps, systems engineering, and automation.

Key Responsibilities

Build and automate cloud infrastructure provisioning, scaling, and deployments using industry-standard tools and infrastructure-as-code practices.

Architect and implement end-to-end MLOps pipelines for packaging, deploying, and monitoring large-scale ML services.

Build and integrate telemetry agents to capture operational, performance, and inference metrics across distributed ML services.

Build backend dashboards and observability workflows that surface quality, performance, traffic, and reliability insights for ML services.

Lead the development of Agentic Ops solutions to optimize large-scale ML production workflows, reduce MTTR, and increase service engineering productivity.

Develop and maintain robust CI/CD pipelines (e.g., GitLab CI, GitHub Actions, Jenkins) enabling automated model conversion, optimization (PTQ/QAT), and artifact packaging.

Drive standards in reliability, cost optimization, and operational readiness across service deployments.

Qualifications

8+ years of experience in DevOps, SRE, or cloud infrastructure engineering roles.

Demonstrated experience designing and managing MLOps lifecycles, including model deployment, inference optimization, and production monitoring.

Strong knowledge of CI/CD methodologies and tools such as GitOps, Docker, Terraform, GitHub Actions, GitLab CI, or Jenkins.

Hands-on expertise with Kubernetes orchestration, including frameworks such as Kubeflow, Argo Workflows, or similar systems.

Strong programming skills in Python, with experience building automation tooling for ML or DevOps workflows.

Proficiency with observability and monitoring platforms (e.g., Prometheus, Grafana, Splunk, New Relic) for building reliable production systems.

Experience optimizing distributed architectures for cost efficiency, reliability, and performance.

Familiarity with deep learning frameworks (e.g., PyTorch, TensorFlow) and model optimization tools such as ONNX, TensorRT, TFLite, AOT, etc., is a strong plus.

+ Show Original Job Post
























Staff Operations Engineer (devops/mlops)
San Jose, California, United States
$159,200 – 301,600 USD / year
Engineering
About Adobe
Provides creative, marketing, and document management software and cloud services for designing, publishing, and managing digital content.