✨ About The Role
- The role involves working on building robust, scalable, high-performance components to support distributed training workloads
- Responsibilities include delivering powerful APIs orchestrating thousands of computers moving/persisting vast amounts of data
- The position requires optimizing and designing for scale the compute and data capabilities across Python and Rust stack
- The job involves deploying training frameworks to the latest supercomputers and rapidly responding to changing needs of ML systems
- The focus is on maximizing productivity of researchers and hardware to accelerate progress towards Artificial General Intelligence (AGI)
âš¡ Requirements
- Experienced in optimizing end-to-end systems and understanding high-performance I/O to maximize local and distributed performance
- Strong software engineering skills with proficiency in Python and Rust
- Thrives in a fast-paced environment, continuously coming up with ideas to enhance system speed while minimizing complexity
- Enjoys working on large distributed systems and is passionate about making systems faster and more efficient
- Comfortable working on deploying training frameworks to supercomputers and adapting to evolving needs of ML systems