✨ About The Role
- The Networking Engineer will design, manage, and optimize WAN and LAN infrastructure for OpenAI's supercomputers.
- Responsibilities include developing and maintaining data collection and monitoring systems to ensure network visibility and performance.
- The role involves troubleshooting and resolving network issues, including TCP/IP, BGP, and physical problems.
- Automating network issue detection and resolution to reduce operational overhead is a key aspect of the job.
- The candidate will work closely with hardware and systems engineers to meet the performance demands of distributed AI training workloads.
âš¡ Requirements
- The ideal candidate will have over 5 years of experience in networking or related infrastructure roles.
- Strong expertise in networking technologies, protocols, and design principles is essential for success in this position.
- Hands-on experience with troubleshooting complex networking issues in both LAN and WAN environments is required.
- A deep understanding of setting up TCP/IP networks from scratch, including BGP and ECMP routing, is crucial.
- The candidate should be skilled in collaborating across teams and possess excellent problem-solving and communication abilities.