✨ About The Role
- The Engineering Manager will lead a team responsible for the reliability, performance, and scalability of hyperscale supercomputers.
- Responsibilities include developing strategies to monitor and maintain system health, ensuring minimal downtime.
- The role involves close collaboration with hardware engineers, systems engineers, and researchers to address infrastructure challenges.
- The manager will drive the development of tools and automation for detecting and resolving hardware health-related issues.
- Building a team culture that prioritizes reliability, scalability, and performance while encouraging innovation is a key focus.
- The position is based in San Francisco, CA, with a hybrid work model requiring 3 days in the office per week.
âš¡ Requirements
- The ideal candidate will have over 5 years of experience in engineering management, particularly in large-scale infrastructure roles.
- A strong background in managing and optimizing hardware and software in high-performance computing environments is essential.
- The candidate should be a seasoned technical leader who enjoys hands-on technical work while also leading teams to achieve peak performance.
- Excellent troubleshooting skills for complex system issues and a proactive approach to developing preventive solutions are crucial.
- A commitment to fostering diversity, equity, and inclusion within the team is highly valued.
- Strong communication skills are necessary to convey complex technical concepts to both technical and non-technical audiences.