View All Jobs 170826

Senior AI Software Engineer, LLM Inference Performance Analysis

Build and optimize compiler passes to enhance GPU inference efficiency for large language models
San Francisco Bay Area
Senior
$148,000 – 235,750 USD / year
17 hours agoBe an early applicant
NVIDIA

NVIDIA

A leading designer of graphics processing units (GPUs) for gaming and professional markets, as well as system on a chip units (SoCs) for the mobile computing and automotive market.

Senior AI Software Engineer

NVIDIA leads the generative AI revolution. We're now seeking an experienced AI Software Engineer to optimize LLM inference performance. Our team collaborates with compiler, kernel, hardware, and framework teams to assess bottlenecks, create optimization methods, and validate improvements. If you're passionate about system-level performance, compiler IR, and GPU kernel optimization for deep learning inference, we'd love to consider you for our team.

What You'll Be Doing

  • Analyze the performance of LLMs on NVIDIA GPUs by employing advanced profiling and projection tools.
  • Find opportunities for performance improvements in the IR-based compiler middle end optimizer and/or in precompiled kernel optimizations driven by Graph IR transformations.
  • Build and develop new compiler passes and optimization techniques to deliver outstanding, robust, and maintainable compiler infrastructure and tools.
  • Collaborate closely with architecture teams to influence and co-design future hardware features that improve compiler and runtime efficiency.
  • Work with geographically distributed teams across compiler, hardware, kernel, and framework domains to drive performance improvements and resolve complex issues.
  • Contribute to a core team at the forefront of deep learning and LLM inference technology, spanning hardware architecture development, kernel optimization, and integration with higher-level deep learning frameworks.

What We Need To See

  • Master's or PhD in Computer Science, Computer Engineering, or a related field, or equivalent experience.
  • 5+ years relevant experience.
  • Strong hands-on programming expertise in C++ and Python, with solid software engineering fundamentals.
  • Skilled in innovative LLM architectures, covering inference optimization, profiling, and compiler-level performance tuning.
  • Significant background in optimizing kernels through information retrieval techniques and generating code, including graph transformations, fusion, scheduling, and developing custom kernel generation frameworks like OpenAI Triton or other compiler-based code generation pipelines.
  • Hands-on experience with deep learning frameworks like TensorRT-LLM, vLLM, SGLang, Jax/XLA, or related compiler/runtime environments.
  • Proven ability to analyze and optimize LLM performance bottlenecks across model development, kernel execution, and runtime systems.
  • Excellent communication and collaboration skills, with the ability to work independently and effectively across distributed teams in a fast-paced environment.
  • Display a robust determination to continuously improve software and hardware performance by engaging in profiling, analysis, and optimization.
  • Proficiency in CUDA programming and familiarity with GPU-accelerated deep learning frameworks and performance tuning techniques.

Ways To Stand Out From The Crowd

  • Showcase innovative applications of agentic AI tools that enhance productivity and workflow automation.
  • Proven background in LLVM, MLIR, and/or Clang compiler development.
  • Active engagement with the open-source LLVM or MLIR community to ensure tighter integration and alignment with upstream efforts.

NVIDIA is recognized as one of the world's most desirable engineering environments, built by teams who value technical depth, innovation, and impact. We work alongside some of the best minds in GPU computing, systems software, and AI. If you're driven by performance, enjoy solving complex problems, and thrive in an environment that rewards initiative and technical excellence, we'd love to hear from you!

+ Show Original Job Post
























Senior AI Software Engineer, LLM Inference Performance Analysis
San Francisco Bay Area
$148,000 – 235,750 USD / year
Engineering
About NVIDIA
A leading designer of graphics processing units (GPUs) for gaming and professional markets, as well as system on a chip units (SoCs) for the mobile computing and automotive market.