JobMesh

Research Engineer, Frontier Evals & Environments

OpenAI · San Francisco, California, US

About the team The Frontier Evals & Environments team builds north star model environments to drive progress towards safe AGI/ASI. This team builds ambitious...

Job description

About the team The Frontier Evals & Environments team builds north star model environments to drive progress towards safe AGI/ASI. This team builds ambitious environments to measure and steer our models, and creates self-improvement loops to steer our training, safety, and launch decisions. Some of the team's open-sourced evaluations include GDPval , SWE-bench Verified , MLE-bench , PaperBench , and SWE-Lancer , and the team built and ran frontier evaluations for GPT4o , o1 , o3 , GPT 4.5 , ChatGPT Agent , and GPT5 . If you are interested in feeling firsthand the fast progress of our models, and steering them towards good, this is the team for you. About you: We seek exceptional research engineers that can push the boundaries of our frontier models. Specifically, we are looking for those that will help us shape our empirical grasp of the whole spectrum of AI capabilities measurement and will own individual threads within this endeavor end-to-end. In this role, you'll: Create ambitious RL environments to push our models to their limits Work on measuring frontier model capabilities, skills, and behaviors Develop new methodologies for automatically exploring the behavior of these mode...