EC2 Infrastructure Services organization is responsible for making EC2 instances available to our customers at all times. We are a key part of what makes EC2 elastic. AI infrastructure has taken a key place in EC2 and we are building systems, services, and automation to operate this at scale. The Software Development Engineer will design, build, and maintain cloud-based provisioning and recovery systems for AWS Trainium-based AI UltraServers. This role requires expertise in AWS services, system architecture, and cross-functional collaboration with Capacity Management, Hardware Engineering, and Datacenter Operations to manage AI/ML infrastructure.
Key job responsibilities:
About the team:
The EC2 UltraServer Provisioning team is a high-performing engineering organization responsible for delivering AWS Trainium-based UltraServers infrastructure at scale. We manage end-to-end provisioning workflows from host ingestion through testing, repair, and recovery.