Senior Software Engineer (Sde III) - Gpu Engineer

Glance Ai Is An Ai Commerce Platform Shaping The Next Wave Of E-commerce With Inspiration-led Shopping, Less About Searching For What You Want And More About Discovering Who You Could Be. Operating In 140 Countries, Glance Ai Transforms Every Screen Into A Stage For Instant, Personal, And Joyful Discovery, Where Inspiration Becomes Something You Can Explore, Feel, And Shop In The Moment.

Its Proprietary Models, Seamlessly Integrated With Google's Most Advanced Ai Platforms Gemini And Imagen On Vertex Ai, Deliver Hyper-realistic, Deeply Personal Shopping Experiences Across Fashion, Beauty, Travel, Accessories, Home Décor, Pets, And More. With An Open Architecture Designed For Effortless Adoption Across Hardware And Software Ecosystems, Glance Ai Is Building A Platform That Can Become A Staple In Everyday Consumer Technology.

Glance Ai Partners With The World's Leading Smartphone Makers, Connected Tv Manufacturers, Telecom Providers And Global Brands, Meeting People Where They Are: On Mobile, Smart Tvs And Brand Websites. Part Of The Inmobi Group, A Global Technology And Advertising Leader Reaching Over 2 Billion Devices And Serving More Than 30,000 Enterprise Brands Worldwide, Glance Ai Is Backed By Google, Jio Platforms And Mithril Capital.

About The Role

We Are Looking For A Senior Software Engineer (Sde III) Who Will Build, Profile, And Optimize Gpu Workloads Powering Next-generation Generative Ai Experiences — From Stable Diffusion Image Generation To Transformer-based Multimodal Models. You'll Work Closely With Research And Infrastructure Teams To Make Model Inference Faster, More Cost-efficient, And Production-ready.

This Role Is Ideal For Engineers Passionate About Pushing Gpus To Their Limits , Writing High-performance Kernels, And Turning Cutting-edge Research Into Scalable Systems.

Key Responsibilities

Develop, Optimize, And Maintain Gpu Kernels (Cuda, Triton, Rocm) For Diffusion, Attention, And Convolution Operators.
Profile End-to-end Inference Pipelines (Data Movement, Kernel Scheduling, Memory Transfers) To Identify And Resolve Bottlenecks.
Apply Techniques Like Operator Fusion, Tiling, Caching, And Mixed-precision Compute To Maximize Gpu Throughput.
Collaborate With Researchers To Productionize Experimental Layers Or Model Architectures.
Build Benchmarking Tools And Micro-tests For Latency, Memory, And Throughput Regressions.
Integrate Kernel Improvements Into Serving Stacks, Ensuring Reliability And Repeatable Performance .
Work With Platform Teams To Tune Runtime Configurations And Job Scheduling For Gpu Utilization.

Required Qualifications

4+ Years Of Experience In Systems Or Ml Engineering, With 2+ Years Working On Gpu Or Accelerator Optimization .
Strong Hands-on Skills With Cuda Programming , Memory Hierarchies, Warps, Threads, And Shared Memory.
Familiarity With Profiling Tools (Nsight, Nvprof, Cupti) And Performance Analysis.
Working Knowledge Of Pytorch, Jax, Or TensorFlow Internals.
Proficiency In C++ And Python .
Experience With Mixed Precision , Fp16/bf16, Or Quantization.
Deep Curiosity About System Bottlenecks And Numerical Correctness.

Preferred Qualifications

Experience Building Fused Operators Or Integrating Custom Kernels With Pytorch Extensions.
Understanding Of Nccl / Distributed Inference Frameworks.
Contributions To Open-source Gpu Or Compiler Projects (Triton, Tvm, Xla, Tensorrt).
Familiarity With Multi-gpu / Multi-node Training And Inference Setups.

Suggest a correction

SDE III - GPU Engineer

Glance

Free Jobs Digest

NoDegree