View All Jobs 155705

Senior Software Engineer, Machine Learning Inference

Develop and optimize inference software for large language models on NVIDIA GPUs
San Francisco Bay Area
Senior
$184,000 – 287,500 USD / year
22 hours agoBe an early applicant
NVIDIA

NVIDIA

A leading designer of graphics processing units (GPUs) for gaming and professional markets, as well as system on a chip units (SoCs) for the mobile computing and automotive market.

Senior Software Engineer, TensorRT Team

At NVIDIA, we're at the forefront of innovation, driving advancements in AI and machine learning to solve some of the world's most challenging problems. We're seeking talented and motivated engineers to join our TensorRT team in developing the industry-leading deep learning inference software for NVIDIA AI accelerators.

As a Senior Software Engineer in the TensorRT team, you will be responsible for designing and implementing inference software optimizations to power AI applications on NVIDIA GPUs. If you're ready to take on challenging projects and make a significant impact in a company that values creativity, excellence, and collaboration, we want to hear from you!

What you'll be doing:

  • Design, develop and optimize NVIDIA TensorRT and TensorRT-LLM to supercharge inference applications for datacenter, workstations, and PCs.
  • Develop software in C++, Python, and CUDA for seamless and efficient deployment of state-of-the-art LLMs and Generative AI models.
  • Collaborate with deep learning experts and GPU architects throughout the company to influence Hardware and Software design for inference.

What we need to see:

  • BS, MS, PhD or equivalent experience in Computer Science, Computer Engineering or a related field.
  • 8+ years of software development experience on a large codebase or project.
  • Strong proficiency in C++ (required), Rust or Python programming languages.
  • Experience in developing Deep Learning Frameworks, Compilers, or System Software.
  • Excellent problem-solving skills and passion to learn and work effectively in a fast-paced, collaborative environment.
  • Strong communication skills and the ability to articulate complex technical concepts.

Ways to stand out from the crowd:

  • Experience in developing inference backends and compilers for GPUs.
  • Knowledge of Machine Learning techniques and GPU programming with CUDA or OpenCL.
  • Background in working with LLM inference frameworks like TensorRT-LLM, vLLM, SGLang.
  • Experience working with deep learning frameworks like TensorRT, PyTorch, JAX.
  • Knowledge of close-to-metal performance analysis, optimization techniques, and tools.

NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative, autonomous and love a challenge, we want to hear from you. Come, join our team and help build the real-time, cost-effective computing platform driving our success in this exciting and quickly growing field.

+ Show Original Job Post
























Senior Software Engineer, Machine Learning Inference
San Francisco Bay Area
$184,000 – 287,500 USD / year
Engineering
About NVIDIA
A leading designer of graphics processing units (GPUs) for gaming and professional markets, as well as system on a chip units (SoCs) for the mobile computing and automotive market.