View All Jobs 120210

Senior AI Quality/evaluation Engineer

Design and implement the evaluation infrastructure for AI platform quality and reliability
Prague, Czech Republic
Senior
7 hours agoBe an early applicant
International Data Group (IDG)

International Data Group (IDG)

Provides technology media, data, and marketing services that connect tech buyers and sellers through insights, events, and demand generation.

3 Similar Jobs at International Data Group (IDG)

Senior Ai Quality/Evaluation Engineer

IDC is building the next generation of AI-powered intelligence platforms that transform how technology decisions get made. Our platform re-imagines the way decision-makers discover and interact with trusted research and data, and is foundational to IDC's future.

We are looking for a Senior AI Quality/Evaluation Engineer to establish the evaluation function for the platform's AI systems. This is a solo function initially. You will design and build the evaluation infrastructure that ensures the platform produces accurate, well-sourced, high-quality responses. You will be the first hire in this function and must be able to operate independently, defining your own roadmap and building from scratch.

The platform's credibility depends on the quality of its AI-generated intelligence. You will build the automated test suites, regression detection systems, and evaluation frameworks that catch quality issues before they reach users. You will work closely with the product team to translate quality criteria into measurable, automatable test scenarios, and with the AI engineering team to ensure that every pipeline change is evaluated against rigorous standards.

What You'll Do

  • Design and build the evaluation infrastructure that ensures the platform's AI systems produce accurate, well-sourced, high-quality responses
  • Build automated test suites that validate answer quality across agent pipeline changes
  • Develop regression detection systems that catch quality degradation before it reaches users
  • Create evaluation frameworks that measure response accuracy, citation correctness, and source quality
  • Work closely with the product team to translate quality criteria into measurable, automatable test scenarios
  • Build cost and latency monitoring that tracks the operational efficiency of AI pipeline execution
  • Define evaluation standards and practices that scale as the platform and team grow

What You Bring

  • 6+ years of software engineering experience, with significant work in testing infrastructure, ML evaluation, or quality systems
  • Experience building evaluation or testing frameworks for LLM-based or ML-based systems
  • Understanding of how to measure response quality for generative AI: accuracy, groundedness, citation correctness, relevance
  • Proficiency in Python
  • Ability to operate independently and define your own roadmap. You will be the first hire in this function
  • Experience working at the intersection of engineering and product, translating qualitative quality criteria into quantitative measurements
  • Experience with LLM evaluation frameworks (e.g., RAGAS, DeepEval, or custom)
  • Familiarity with LLM observability tools (e.g., Langfuse, LangSmith, Weights & Biases)
  • Background in statistical methods for quality measurement (significance testing, distribution analysis)
  • Experience building A/B testing or experimentation infrastructure
  • Background in search relevance evaluation or information retrieval metrics

Why This Role Stands Out

At IDC, your work helps shape how the world understands technology and where it goes next. You collaborate with curious, high-caliber colleagues who value rigor, integrity, and shared success. As the premier global provider of trusted technology intelligence, IDC equips business and technology leaders with the evidence they need to make confident decisions. Our insights inform strategy, investment, and innovation across industries and regions.

Recognized by IIAR as Analyst Firm of the Year for five consecutive years, IDC sets the standard for credibility and impact. With more than 1,000 analysts worldwide and a truly global perspective, we combine deep expertise with practical relevance. Here, your ideas matter, your voice is heard, and your contributions provide the insights leaders rely on every day. It is meaningful work, backed by a culture that supports growth, collaboration, and long-term career development with a globally respected brand.

What We Offer

  • Hybrid/remote work model (about 1-2 days in the office per month).

  • A position in a highly professional and globally respected market research and advisory firm, where initiative leading to results is rewarded.

  • Individualized culture: an environment where you can explore new areas outside your specialty and stay engaged with work you enjoy.

Equal Opportunity Employer

IDC is committed to providing equal employment opportunities for all qualified persons. Employment eligibility verification required. We participate in E-Verify.

+ Show Original Job Post
























Senior AI Quality/evaluation Engineer
Prague, Czech Republic
Engineering
About International Data Group (IDG)
Provides technology media, data, and marketing services that connect tech buyers and sellers through insights, events, and demand generation.