Microsoft Azure Artificial Intelligence/High Performance Computing (AI/HPC) team is looking for systems engineers, architects and thought leaders to enable customers in deploying, monitoring, profiling, and debugging their applications on hyperscale cloud infrastructure. Azure is enabling the largest supercomputing deployments to tackle complex computational problems in public cloud, evident from the various HPC products that have already made the mark on Top500, MLPerf and Graph500 rankings.
The AI Systems Architect will work on all aspects of inference and training systems focusing on system design, data center planning and modeling of workloads on multiple GPU SKUs. The work will entail reliability modeling, lifecycle modeling and analysis of workloads and GPUs, GPU planning, analytical design of systems and workload assignment. The successful candidate will focus on deep LLM modeling and disseminating results across cross functional orgs towards better understanding of software-hardware codesign features. The candidate must be able to demonstrate deep knowledge of AI systems and architectures for both training and inference SKUs across multi-vendor and multi-generational GPUs and models.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.