View All Jobs 170797

Senior Software Engineer

Diagnose and troubleshoot failures in large-scale supercomputing systems for AI workloads
Remote
Senior
yesterday
Microsoft

Microsoft

A global technology leader known for its software products, cloud services, and hardware like Windows OS and Xbox consoles.

Senior Supercomputing Software & Systems Engineer

Microsoft Azure High Performance Computing & AI Engineering (HPC & AI Eng) team is responsible for managing the core platform & fleet of AI High Performance Computing products that customers use to run their most performant and demanding workloads. The AI Customer Experience (AICE) engineering team within the HPC & AI Eng. team is on the frontlines managing the flagship supercomputers used by top tier AI customers that enable breakthroughs such as ChatGPT and are highlighted in Top500, MLPerf and Graph500 rankings.

Operating at supercomputing scale requires specialized tools and techniques to ensure system reliability, runtime performance, and job health, while continuing to meet customer Service Level Agreements (SLAs). As a Senior Supercomputing Software & Systems Engineer, you will be responsible for diagnosing & troubleshooting the largest scale supercomputing systems across the infrastructure stack (GPU hardware, networking, datacenter and core software). In this role, you will develop and apply advanced tools, identify operational gaps, and implement features that support the smooth operation of cloud-native supercomputers. This opportunity will give you hands-on experience developing capabilities to manage the largest scale of supercomputers delivered to our customers.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

+ Show Original Job Post
























Senior Software Engineer
Remote
Engineering
About Microsoft
A global technology leader known for its software products, cloud services, and hardware like Windows OS and Xbox consoles.