View All Jobs 126878

Software Development Engineer, EC2 Trainium AI Infra

Build and maintain scalable cloud-based provisioning and recovery systems for AWS Trainium UltraServers
Seattle
Mid-Level
23 hours agoBe an early applicant
Amazon

Amazon

Global e-commerce and cloud computing leader offering online retail, digital content, and scalable web services to consumers and businesses worldwide.

Software Development Engineer

EC2 Infrastructure Services organization is responsible for making EC2 instances available to our customers at all times. We are a key part of what makes EC2 elastic. AI infrastructure has taken a key place in EC2 and we are building systems, services, and automation to operate this at scale. The Software Development Engineer will design, build, and maintain cloud-based provisioning and recovery systems for AWS Trainium-based AI UltraServers. This role requires expertise in AWS services, system architecture, and cross-functional collaboration with Capacity Management, Hardware Engineering, and Datacenter Operations to manage AI/ML infrastructure.

Key job responsibilities:

  • The Software Development Engineer is responsible for building and maintaining scalable micro services.
  • They are adept at system design that solves the business problem efficiently.
  • Work in environments where the technology strategy is defined but the solution design is not.
  • Build cloud-based solutions using AWS native services for scaling infrastructure frameworks.
  • Create observable systems with appropriate metrics and alarming.
  • Collaborate with customers and stakeholders to convert business needs into technical designs.
  • Participate in code reviews and technical assessments.

About the team:

The EC2 UltraServer Provisioning team is a high-performing engineering organization responsible for delivering AWS Trainium-based UltraServers infrastructure at scale. We manage end-to-end provisioning workflows from host ingestion through testing, repair, and recovery.

+ Show Original Job Post
























Software Development Engineer, EC2 Trainium AI Infra
Seattle
Engineering
About Amazon
Global e-commerce and cloud computing leader offering online retail, digital content, and scalable web services to consumers and businesses worldwide.