Role Focus:

Advance the science and engineering to make Claude a trustworthy searcher by defining hypotheses and designing experiments.
Turn search post-training from a craft into a measurable science with cleanly isolated variables and reproducible signal.
This work sits at the intersection of reinforcement learning, retrieval, and evaluation, shaping Claude's behavior in evidence-based settings.

Key Responsibilities:

Own research direction end-to-end, from hypothesis formation to experiment design and training runs.
Build controlled experiment infrastructure to study environmental factors and design evaluations that distinguish genuine reasoning.
Drive optimization rigor through efficient experiment design and ablations, and set the team's experimental standards.
Collaborate with researchers across post-training, RL infrastructure, and product to translate model behavior into training signals.

Qualifications:

Must have an unusually rigorous, quantitative mindset and be an outstanding software engineer in Python.
Must have shipped real ML research repeatedly with a taste for worthwhile experiments and operate well with high autonomy.
Preferred experience includes hands-on RL with LLMs, background in search/retrieval/RAG, and experience in research-heavy environments.
Prior published research on LLMs, RL, retrieval, or calibration is a plus, as is experience with distributed training systems.

Anthropic

Anthropic creates reliable, interpretable, and steerable AI systems with a mission for AI to be safe and beneficial. The company is a quickly growing group of researchers, engineers, policy experts, and business leaders working collaboratively to build beneficial AI systems.

Apply for This Position