Similar Jobs

See all

Research Engineer, Reward Models Platform

Anthropic

US

Python ML Kubernetes

Research Engineer (Agentic Models)

JetBrains

Europe

Python PyTorch Kubeflow

Senior AI Engineer

Paper

Python PyTorch TensorFlow

Director of Machine Learning, Safety & Mods

US

Machine Learning AI NLP

As a Senior Research Scientist, you will lead research on novel reward model architectures and training approaches for RLHF. You'll develop and evaluate LLM-based grading and evaluation methods, including rubric-driven approaches. Further, you will research techniques to detect, characterize, and mitigate reward hacking and specification gaming.

Responsibilities Include:

-Designing experiments to understand reward model generalization, robustness, and failure modes

-Collaborating with the Finetuning team to translate research insights into improvements for production training pipelines

-Contributing to research publications, blog posts, and internal documentation

Anthropic

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems, to be safe and beneficial for users and society.

Apply for This Position