View All Jobs 1888

Software Engineer, Supercomputing, HPC Infrastructure

Build and deploy supercomputers to support the world's largest AI models.
San Francisco Bay Area
Senior
$295,000 - 530,000 USD / year
1 week ago

✨ About The Role

- The role involves working closely with machine learning researchers to build and maintain OpenAI's compute and infrastructure - Responsibilities include deploying huge clusters using Kubernetes and Azure, and building an internal experiment platform for running/training large AI models - The team works at the cutting edge of speed and scale, combining High-Performance Computing (HPC) traditions in a modern cloud and containerized environment - The job requires experience in high-performance computing, open-source contributions, and running large Kubernetes clusters with GPU workloads - The successful candidate will have a direct impact on the success of OpenAI and the field of AI as a whole
- Experienced in designing, implementing, and running production services and highly available distributed systems - Proficient in working with highly performant bare-metal systems and large Kubernetes clusters with GPU workloads - Skilled in managing and monitoring large-scale infrastructure deployments and debugging problems across the stack - Comfortable with bash, Terraform, Python, and/or Chef, and have experience with cloud platforms like Azure, AWS, or GCP - Able to quickly obtain a deep technical understanding of new domains and identify important problems to solve
+ Show Original Job Post
























Software Engineer, Supercomputing, HPC Infrastructure
San Francisco Bay Area
$295,000 - 530,000 USD / year
Engineering
About OpenAI
Building artificial general intelligence