✨ About The Role
- The role involves owning LLM evaluation processes and methods, focusing on generating benchmarks that reflect real-world usage and safety vulnerabilities.
- The candidate will be responsible for generating high-quality synthetic data, curating labels, and conducting rigorous benchmarking.
- Delivering robust, scalable, and reproducible production code is a key responsibility.
- The position requires developing innovative methods for benchmarking LLMs to assess harmlessness and helpfulness.
- The candidate will have opportunities to co-author papers, patents, and presentations with the research team.
⚡ Requirements
- The ideal candidate will have domain knowledge in LLM evaluation and data curation techniques.
- Extensive experience in designing and implementing LLM benchmarking is essential, along with a comfort level in leading end-to-end projects.
- Adaptability and flexibility are crucial, as the candidate must be able to shift focus based on new findings in the community.
- A strong motivation to work on safe and responsible AI is important for success in this role.
- Previous research or projects in benchmarking LLMs will be highly regarded.