Similar Jobs
See allSenior AI Engineer
Osano
US
FastAPI
Python
Remote Engineering Lead
Jobgether
US
ML
APIs
Team Leadership
New Staff Machine Learning Engineer
Cresta
US
LLMs
RAG
PyTorch
Investment Specialist – Freelance AI Trainer Project
Greenhouse
Global
Finance
Economics
Accounting
Tutoring Specialist - Freelance AI Trainer Project
Greenhouse
Global
Pedagogy
Instruction
Communication
Responsibilities:
- Build new novel and long-horizon evaluations
- Develop novel measurement approaches for understanding how model capabilities emerge and evolve during RL training
- Lead strategic evaluation coverage across the company
You may be a good fit if you:
- Have significant experience designing and running evaluations for large language models or similar complex ML systems
- Have led technical projects or teams, either formally or through sustained ownership of critical research directions
- Are equally comfortable designing experiments and writing code—you can move between research and implementation fluidly
Representative projects:
- Designing and implementing a suite of long-horizon evaluations that test model capabilities on tasks requiring sustained reasoning, planning, and tool use over extended interactions
- Building systems to track capability development across RL training checkpoints, surfacing insights about when and how specific capabilities emerge
- Conducting a cross-org audit of evaluation coverage, identifying blind spots, and prioritizing new evaluations to fill critical gaps across Pretraining, RL, Inference, and Product
Anthropic
Anthropic's mission is to create reliable, interpretable, and steerable AI systems, ensuring AI is safe and beneficial for users and society. They are a growing group of researchers, engineers, policy experts, and business leaders committed to building beneficial AI systems.