✨ About The Role
- Develop high-performance GPU kernels focusing on utilization and application performance
- Make aggressive optimizations to overcome hardware limitations and develop low precision algorithms for high performance
- Collaborate with ML engineers to create model architectures suitable for efficient training and inference
- Work with the scaling team to deploy kernels and manage training uptime
- Advise on HW/SW co-design and work with hardware vendors to optimize performance
âš¡ Requirements
- Strong coding skills in C/C++ and Python with a deep understanding of GPU and/or other AI accelerators
- Experience with CUDA or a related accelerator programming language and driving ML accuracy with low precision formats
- Ability to collaborate with infrastructure and ML engineers, with at least 3+ years of relevant industry experience
- Thrives on achieving performance improvements and has a background in Computer Science and Engineering
- Enjoys working on high-performance GPU kernels and optimizing for efficiency and performance