View All Jobs 126878

Senior Software Development Engineer, EC2 Trainium AI Infra

Lead EC2 UltraServer provisioning team to scale AI infrastructure on Trainium Ultraservers
Seattle
Senior
22 hours agoBe an early applicant
Amazon

Amazon

Global e-commerce and cloud computing leader offering online retail, digital content, and scalable web services to consumers and businesses worldwide.

Software Development Engineer

The Software Development Engineer will lead the team in technical strategy, design, build, and operation of infrastructure services including provisioning and availability of AWS Trainium-based AI servers. This role requires expertise in architecting large-scale systems, building micro services, and cross-functional collaboration with several other teams such as capacity management, hardware engineering, and datacenter teams to manage AI/ML infrastructure.

Key job responsibilities:

  • Design and develop innovative technologies that power the infrastructure supporting AI workloads on Ultraservers
  • Lead technical projects establishing EC2 as the pioneer in cloud computing for AI/ML workloads across diverse applications including LLMs, multimodal systems, and emerging model architectures.
  • Collaborate with various teams to influence architecture of provisioning systems and improve to operate at scale and efficiently.
  • Build customer relationships by investigating complex performance challenges, developing solutions, and publishing actionable best practices through multiple channels.

About the team:

The EC2 UltraServer Provisioning team is a high-performing engineering organization responsible for delivering AWS Trainium-based UltraServers infrastructure at scale. We manage end-to-end provisioning workflows from host ingestion through testing, repair, and recovery.

+ Show Original Job Post
























Senior Software Development Engineer, EC2 Trainium AI Infra
Seattle
Engineering
About Amazon
Global e-commerce and cloud computing leader offering online retail, digital content, and scalable web services to consumers and businesses worldwide.