✨ About The Role
- Design and implement custom networking collectives that are integrated into the training stack
- Collaborate with ML researchers to ensure efficient collective operations in C++ and CUDA
- Work on simulations to inform future supercomputer network designs
- Ensure that the largest training jobs take full advantage of different network transports used in supercomputers
- Contribute to the AI research progress at OpenAI by incorporating learnings from the entire research organization into the training platform
âš¡ Requirements
- Ideal candidate has experience in writing distributed algorithms using RDMA and is comfortable with low-level performance-sensitive CPU and/or GPU code
- Strong background in network simulation techniques is preferred
- Ability to collaborate closely with ML researchers to design and implement efficient collective operations in C++ and CUDA
- Experience with custom networking collectives and network transports used in supercomputers is a plus
- Thrives in a fast-paced environment and enjoys working on novel collective communication techniques