Similar Jobs
See allAs a Senior Research Scientist, you will lead research on novel reward model architectures and training approaches for RLHF. You'll develop and evaluate LLM-based grading and evaluation methods, including rubric-driven approaches. Further, you will research techniques to detect, characterize, and mitigate reward hacking and specification gaming.
Responsibilities Include:
-Designing experiments to understand reward model generalization, robustness, and failure modes
-Collaborating with the Finetuning team to translate research insights into improvements for production training pipelines
-Contributing to research publications, blog posts, and internal documentation
Anthropic
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems, to be safe and beneficial for users and society.