View All Jobs 116080

ML Engineer (LLM)

Build real-time multilingual voice conversation pipelines with low latency and high quality
Berlin
Senior
3 weeks ago

The Role

We are looking for a hands-on ML Engineer who lives at the intersection of TTS, STT and large language models. You will design and ship new low-latency voice capabilities, working closely with product, research and infrastructure teams to push the boundaries of natural, multilingual conversation.

What You'll Do

  • Architect & implement real-time speech pipelines (ASR → LLM → TTS) that meet stringent latency and quality targets.

  • Evaluate and fine-tune state-of-the-art ASR, LLM and TTS models—both commercial and open-source—and integrate the best performers into production.

  • Optimise inference through quantisation, distillation, hardware-aware graph compilation and reinforcement-learning-based tuning.

  • Expose scalable APIs & micro-services with Python/FastAPI, gRPC or WebSocket streaming, backed by robust observability and autoscaling.

  • Own deployment across cloud and on-prem environments, collaborating on containerisation (Docker), orchestration (Kubernetes) and CI/CD workflows.

  • Stay ahead of the curve by tracking research, running experiments and sharing learnings with the broader team.

What we're looking for

  • Python Engineering: 5+ years writing production-grade, well-tested Python; deep familiarity with async, typing and performance profiling

  • Speech / Audio: Hands-on experience building real-time ASR, TTS, voice chat or streaming audio products

  • LLM Tooling: Fine-tuning, prompt design, evaluation, retrieval-augmented generation; familiarity with frameworks such as Openpipe/ART, LangChain, LlamaIndex or similar

  • Systems & MLOps: Containerisation, GPU scheduling, observability, DevOps on GCP or AWS; infrastructure-as-code principles

  • API Design: Building and maintaining high-throughput REST/gRPC/FastAPI services; securing and monitoring them in production

Bonus Points

  • Model compression expertise (quantisation, pruning, ONNX/TensorRT)

  • Knowledge of audio and acoustics

  • Experience with reinforcement-learning-from-human-feedback (RLHF) or direct preference optimisation

  • Contributions to open-source ML/speech projects (share your GitHub!)

  • Familiarity with GPU inference servers (Triton, KServe) or distributed compute frameworks (Ray)

Founded in Berlin in 2023 by serial entrepreneurs Albert Astabatsyan, Hakob Astabatsyan, and Sassun Mirzakhan-Saky, Synthflow AI democratizes access to advanced voice AI with a no-code platform that lets enterprises easily create, deploy and scale natural-sounding, cost-effective voice agents tailored to their business needs.

+ Show Original Job Post
























ML Engineer (LLM)
Berlin
Engineering
About Synthflow AI