At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.
In the rapidly expanding world of AI and HPC, solid system level integration and validation become paramount. Our DCGPU solutions, including APUs and GPUs, demand rigorous testing for optimum performance and reliability. We are seeking a Principal Member of Technical Staff Systems Design Engineer with extensive silicon to systems understanding to drive complex system level debugs to resolution. The Systems Design Engineering team fosters and encourages continuous technical innovation to showcase successes as well as facilitate continuous career development.
The ideal candidate will possess systems design and validation engineering expertise that will be leveraged towards product development, validation, and root cause resolution. You are an expert system level engineer with understanding of hardware/firmware interaction, system level test scenarios and complex system level issue debug and methodologies. A background in network systems validation and system level networking telemetry/diagnostics and debug methodologies is beneficial. You will be part of a team to drive and improve AMD's abilities to deliver the highest quality, industry-leading technologies to market.
Develop a deep understanding of our silicon SoCs, board level designs, system and rack level designs and firmware/software/management stacks to drive the development of system stress tests and complex issue investigations. Lead the debug and triage of issues found during the silicon validation and production phases of our AI and HPC systems. Apply learnings from system debugs toward developing stronger test coverage and enable strategies to accelerate issue debug and root cause. Work with multiple teams to develop and execute robust validation test plans at the functional, stress and volume levels that meet our customer workloads and requirements. Contribute to technical innovation to improve AMD's capabilities across validation, including tool and script development, technical and procedural methodology enhancement, and various internal and cross-functional technical initiatives. Engage with our customers and partners on co-validation plans and priorities as well as critical issue debugs.
Programming/scripting skills (e.g. C/C++, Perl, Ruby, Python) Extensive experience with board/platform-level debug, including delivery, sequencing, analysis, and optimization - structured approach to debug workflows. Extensive knowledge of system architecture, technical debug, and validation strategy Strong analytical/problem-solving skills and pronounced attention to details Extensive exposure to system level integration and debug of SoC system level test scenarios including resets, RAS, system management, networking, workloads or performance Must be a self-starter, and able to independently drive tasks to completion
Bachelor or Masters Degree in Computer Engineering or Electrical Engineering