The Relevance Evaluation Specialist is responsible for assessing how effectively an AI-powered search engine surfaces and generates content in response to user queries. This role focuses on evaluating the alignment between user intent and system outputs, including documents, images, videos, and AI-generated answers. The evaluator applies structured judgment to determine whether retrieved or generated content satisfies the user's information need, using predefined relevance criteria and standardized scoring guidelines.
Core responsibilities include reviewing query–artifact pairs and assigning relevance scores on a five-point scale, ranging from not relevant to fully satisfying the user's intent. The specialist must analyze and infer user intent by examining query language, system-provided context such as timing and requester metadata, and any available interaction history. Accurate evaluation requires using the AI search engine's retrieval and assistant capabilities to reproduce queries, compare expected versus actual results, and determine the appropriateness of surfaced content.
In addition to document relevance, the role involves evaluating the quality of AI-generated responses. This includes assessing factual correctness, completeness, clarity, and whether the response appropriately addresses the inferred user intent. The evaluator must also determine whether the system behaves correctly when information is incomplete or unavailable, such as providing partial answers or explicitly acknowledging uncertainty. To support accurate assessments, the specialist is expected to independently research unfamiliar terminology, concepts, or domain-specific knowledge referenced in queries or responses.
The position requires the ability to formulate effective search queries and conversational prompts, including constructing and executing KQL queries against Elasticsearch indices when necessary to validate retrieval behavior. Evaluation decisions must be applied consistently across a high volume of tasks and documented clearly in accordance with established quality, audit, and compliance standards. Accuracy, consistency, and adherence to evaluation guidelines are prioritized over speed.
Qualified candidates must demonstrate strong English reading comprehension and written communication skills, the ability to reason through ambiguous or underspecified queries, and disciplined attention to detail. Familiarity with modern search behavior—including keyword-based, natural-language, and conversational querying—is required, along with comfort using web-based tools and internal evaluation platforms.
Preferred qualifications include prior experience in relevance labeling, search quality evaluation, content review, or data annotation; exposure to AI-driven search or information retrieval systems; and the ability to write basic Python scripts for task automation or data handling. Familiarity with Elasticsearch concepts or query languages, as well as experience evaluating technical, business, or operational content across varied domains, is considered beneficial.
This role operates within a rule-driven evaluation environment that emphasizes independent judgment within defined guidelines. Performance is measured by consistency, correctness, and alignment with relevance standards, with direct impact on the accuracy, usefulness, and reliability of the AI-powered search engine.
Salary Range: $23.49 - $37.10 USD (Hourly)
Astreya offers comprehensive benefits to all Regular, Full-Time Employees, including: