Job Description
Develop complex RL environments and sandbox systems used by top LLM teams to train coding agents. Switch between new codebases every 1–1.5 months, quickly understand architectures, and design RL tasks aligned with each environment. Build engineering challenges, task frameworks, scoring mechanisms, test suites, and mock infrastructure. Work across backend, testing, mocks, task evaluation, and light frontend when required. Navigate infrastructure components as needed to support environment functionality. Build internal tools and agent-driven systems that accelerate RL environment development and testing.
About Respond
This role is for one of our clients in the Insurance industry.