YOUR MISSION:

Build a scalable self-serve evaluation platform to power our research and development

RESPONSIBILITIES:

Design a Python framework that makes it easy for poolsiders to implement both internal and public benchmarks in a centralized way
Build and maintain the pipeline that runs distributed evaluations at scale
Collaborate with modeling and product teams to identify opportunities to improve our experimentation and evaluation tooling

SKILLS & EXPERIENCE:

Strong engineering background
Experience leading software projects cross functionally
Experience building highly reliable and well tested services

PROCESS:

Intro call with one of our Founding Engineers
Technical Interview(s) with one of our Founding Engineers
Team fit call with the People team

Poolside

Poolside aims to be the leading company in building a world where AI drives economically valuable work and scientific progress. They are a remote-first team across Europe and North America, gathering monthly in person for 3 days and twice a year for longer offsites.

Apply for This Position