View All Jobs 128471

Software Development Engineer, ML Systems Integration, Machine Learning Israel (mlil) — Integration Validation

Lead integration validation and CI/CD pipeline efforts for ML inference systems
Tel Aviv
Senior
6 hours agoBe an early applicant
Amazon

Amazon

Global e-commerce and cloud computing leader offering online retail, digital content, and scalable web services to consumers and businesses worldwide.

Senior Software Development Engineer

Annapurna Labs designs silicon and software that accelerates innovation. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world. The Integration team is looking for a Senior Software Development Engineer to lead the design and delivery of systems software for our next-generation ML accelerator servers. In this role you will own the design and implementation of CI/CD pipelines, test frameworks, and system-level validation for our next-generation ML inference accelerator platform. You will work across the full stack — from firmware interfaces through data-plane performance benchmarking to production fleet readiness — ensuring every component is validated end-to-end before it reaches customers. This is a greenfield environment with rapidly growing scope: new silicon, new software stacks (vLLM, NKI, NIXL), and new fleet-scale challenges. We are looking for a senior IC who can independently drive technical decisions, scale our validation infrastructure, and raise the bar on engineering quality across the group.

Key job responsibilities:

  • Own and evolve CI/CD pipelines — from pre-merge gates through continuous deployment to fleet.
  • Design and implement test frameworks that enable firmware and data-plane developers to write, run, and maintain tests with minimal friction.
  • Architect system-level test suites that stress control-plane and data-plane components beyond provisioning and vetting flows.
  • Build and maintain performance benchmarking infrastructure for LLM inference workloads (Prefill + Decode), including dashboarding and regression detection.
  • Drive integration of third-party vendor code (nightly drops) into CI/CD, ensuring quality gates catch regressions early.
  • Participate in feature design reviews, contributing test plans and challenging coverage gaps.
  • Define and own Continuous Testing in production environments (CTS).
  • Leverage AI-assisted development tools (Kiro, LLM-based code generation) to accelerate team velocity and pioneer new engineering workflows.
+ Show Original Job Post
























Software Development Engineer, ML Systems Integration, Machine Learning Israel (mlil) — Integration Validation
Tel Aviv
Engineering
About Amazon
Global e-commerce and cloud computing leader offering online retail, digital content, and scalable web services to consumers and businesses worldwide.