Senior Research Scientist, Reward Models

Anthropic

Salary range

$340,000–$425,000/yr

Benefits

Similar Jobs

See all

As a Senior Research Scientist, you will lead research on novel reward model architectures and training approaches for RLHF. You'll develop and evaluate LLM-based grading and evaluation methods, including rubric-driven approaches. Further, you will research techniques to detect, characterize, and mitigate reward hacking and specification gaming.

Responsibilities Include:

-Designing experiments to understand reward model generalization, robustness, and failure modes

-Collaborating with the Finetuning team to translate research insights into improvements for production training pipelines

-Contributing to research publications, blog posts, and internal documentation

Anthropic

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems, to be safe and beneficial for users and society.

Apply for This Position