For our client, we are looking for an ML Engineer with experience in creating reinforcement learning (RL) environments and scalable model evaluation systems. Our client is a leading provider of solutions for evaluating and optimizing AI systems. Many international companies use their solutions to improve AI agents and detect performance issues in large language models. The company operates at the intersection of AI, system engineering, and AGI research. In the team, you will have a real impact on shaping how future AI models will be trained, fine-tuned, and evaluated. You will be responsible for designing and implementing advanced RL environments - this is a key competency for us. Your responsibilities will also include creating scalable systems shaping the behavior of modern AI models. The job will be in a small, highly specialized team of engineers and scientists. Due to the client's time zone, we would like to find a candidate who can work until 5:00 pm, and occasionally until 6:00 pm.
Daily tasks: Designing and implementing RL environments supporting large-scale experiments and agent evaluations Building pipelines for generating tasks, dynamic datasets, and simulated environments (of varying complexity and randomness) Creating reward models and verifiers that automatically evaluate model trajectories and reasoning Cooperation with infrastructure and systems engineers on scaling, telemetry, and ensuring reproducibility of environments Designing APIs and frameworks for orchestrating experiments, resetting, and evaluating agents Optimizing the performance of environments, logging, and reproducibility of rewards in distributed environments
Requirements: 5 years of experience in data science, ML infrastructure, or related fields Very good knowledge of Python and system programming Knowledge of reinforcement learning concepts (reward modeling, environment dynamics, agent verification and evaluation) Knowledge of monitoring, metrics, and data pipelines for RL evaluation Experience with RL frameworks or simulation (Gymnasium, PettingZoo, Isaac Gym, Ray RLlib, etc.) Experience in designing scalable task pipelines, browser simulations, or distributed computing frameworks (e.g., Playwright, Selenium) English enabling free communication in an international team Miles visible: Interest in the area of AI safety/AGI alignment
How we work and what we offer? We focus on open communication both in the recruitment process and after employment - we care about clarity of information regarding the process and employment We approach recruitment in a humane way, therefore we simplify our recruitment processes to be as simple and friendly as possible for the candidate We work on the principle of "remote first", so remote work is the norm for us, and we limit business trips to a minimum We offer private medical care (Medicover) and a Multisport card for contractors
How to apply? Send us your application using the form!