JobMesh

Researcher, Misalignment Research

OpenAI · San Francisco, California, US

About the Team Safety Systems sits at the forefront of OpenAI’s mission to build and deploy safe AGI, ensuring our most capable models can be released respon...

Job description

About the Team Safety Systems sits at the forefront of OpenAI’s mission to build and deploy safe AGI, ensuring our most capable models can be released responsibly and for the benefit of society. Within Safety Systems, we are building a misalignment research team to focus on the most pressing problems for the future of AGI. Our mandate is to identify, quantify, and understand future AGI misalignment risks far in advance of when they can pose harm. The work of this research taskforce spans four pillars: Worst‑Case Demonstrations – Craft compelling, reality‑anchored demos that reveal how AI systems can go wrong. We focus especially on high importance cases where misaligned AGI could pursue goals at odds with human well being. Adversarial & Frontier Safety Evaluations – Transform those demos into rigorous, repeatable evaluations that measure dangerous capabilities and residual risks. Topics of interest include deceptive behavior, scheming, reward hacking, deception in reasoning, and power-seeking, along with other related areas. System‑Level Stress Testing – Build automated infrastructure to probe entire product stacks, assessing end‑to‑end robustness under extreme conditions. We treat...