Similar Jobs

See all

New Staff Machine Learning Engineer

Cresta

US

LLMs RAG PyTorch

Investment Specialist – Freelance AI Trainer Project

Greenhouse

Global

Finance Economics Accounting

Tutoring Specialist - Freelance AI Trainer Project

Greenhouse

Global

Pedagogy Instruction Communication

Responsibilities:

Build new novel and long-horizon evaluations
Develop novel measurement approaches for understanding how model capabilities emerge and evolve during RL training
Lead strategic evaluation coverage across the company

You may be a good fit if you:

Have significant experience designing and running evaluations for large language models or similar complex ML systems
Have led technical projects or teams, either formally or through sustained ownership of critical research directions
Are equally comfortable designing experiments and writing code—you can move between research and implementation fluidly

Representative projects:

Designing and implementing a suite of long-horizon evaluations that test model capabilities on tasks requiring sustained reasoning, planning, and tool use over extended interactions
Building systems to track capability development across RL training checkpoints, surfacing insights about when and how specific capabilities emerge
Conducting a cross-org audit of evaluation coverage, identifying blind spots, and prioritizing new evaluations to fill critical gaps across Pretraining, RL, Inference, and Product

Anthropic

Anthropic's mission is to create reliable, interpretable, and steerable AI systems, ensuring AI is safe and beneficial for users and society. They are a growing group of researchers, engineers, policy experts, and business leaders committed to building beneficial AI systems.

Apply for This Position

New Research Lead, Training Insights

Similar Jobs

Senior AI Engineer

Remote Engineering Lead

New Staff Machine Learning Engineer

Investment Specialist – Freelance AI Trainer Project

Tutoring Specialist - Freelance AI Trainer Project

Anthropic