Role Focus:
- Advance the science and engineering to make Claude a trustworthy searcher by defining hypotheses and designing experiments.
- Turn search post-training from a craft into a measurable science with cleanly isolated variables and reproducible signal.
- This work sits at the intersection of reinforcement learning, retrieval, and evaluation, shaping Claude's behavior in evidence-based settings.
Key Responsibilities:
- Own research direction end-to-end, from hypothesis formation to experiment design and training runs.
- Build controlled experiment infrastructure to study environmental factors and design evaluations that distinguish genuine reasoning.
- Drive optimization rigor through efficient experiment design and ablations, and set the team's experimental standards.
- Collaborate with researchers across post-training, RL infrastructure, and product to translate model behavior into training signals.
Qualifications:
- Must have an unusually rigorous, quantitative mindset and be an outstanding software engineer in Python.
- Must have shipped real ML research repeatedly with a taste for worthwhile experiments and operate well with high autonomy.
- Preferred experience includes hands-on RL with LLMs, background in search/retrieval/RAG, and experience in research-heavy environments.
- Prior published research on LLMs, RL, retrieval, or calibration is a plus, as is experience with distributed training systems.
Anthropic
Anthropic creates reliable, interpretable, and steerable AI systems with a mission for AI to be safe and beneficial. The company is a quickly growing group of researchers, engineers, policy experts, and business leaders working collaboratively to build beneficial AI systems.