View All Jobs 138208

AI Inference Engineer - Remote Eligible

Port and optimize AI models for real-time edge inference on Quadric platform
Burlingame, California, United States
Senior
1 week ago
Quadric

Quadric

Delivers a unified hardware-software platform with edge processors and tools for efficient deployment of AI and computer vision workloads.

Quadric GPNPU Ai Inference Engineer

Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems. Unlike other NPUs or neural network accelerators in the industry today that can only accelerate a portion of a machine learning graph, the Quadric GPNPU executes both NN graph code and conventional C++ DSP and control code.

Role:

The AI Inference Engineer in Quadric is the key bridge between the world of AI/LLM models and Quadric unique platforms. The AI Inference Engineer at Quadric will [1] port AI models to Quadric platform; [2] optimize the model deployment for efficient inference; [3] profile and benchmark the model performance. This senior technical role demands deep knowledge of AI model algorithms, system architecture and AI toolchains/frameworks.

Responsibilities:

  • Quantize, prune and convert models for deployment
  • Port models to Quadric platform using Quadric toolchain
  • Optimize inference deployment for latency, speed
  • Benchmark and profile model performance and accuracy
  • Develop tools to scale and speed up the deployment
  • Make Improvement to SDK and runtime
  • Provide technical support and documents to customers and developer community
+ Show Original Job Post
























AI Inference Engineer - Remote Eligible
Burlingame, California, United States
Engineering
About Quadric
Delivers a unified hardware-software platform with edge processors and tools for efficient deployment of AI and computer vision workloads.