Lead the AI Evaluation team, owning staffing, coaching, performance management, and delivery of evaluation and testing frameworks.
Manage the AI evaluation lifecycle — including pre-launch testing, simulation, and post-deployment health monitoring — ensuring alignment with governance standards and expectations.
Create domain-specific evaluation tracks (e.g., Compliance & Risk, Bot Experience, Agent Experience) to assess AI quality from multiple perspectives.

To Thrive in This Role, You Have:

7+ years in AI/ML operations, quality, or evaluation with at least 2+ years of people leadership experience.
Deep understanding of LLM behavior, prompt testing, and evaluation methodologies.
Familiarity with human-in-the-loop frameworks and prompt testing tools.

Why This Role Matters:

This role creates the execution layer between AI experimentation and operational reality — ensuring governance standards are consistently applied and AI systems are safe, fair, and high-performing in production.
You’ll lead the teams that deliver the evaluation signals Operations relies on to trust every AI model deployed.

Chime

Chime is a financial technology company that believes everyone can achieve financial progress. They are a team of problem solvers, dreamers, and builders with one shared obsession: their members.

Apply for This Position