USD/year
We are looking for an experienced engineer with expertise in evaluating Generative AI systems, particularly Large Language Models (LLMs), to help us build and evolve our internal evaluation frameworks, and/or integrate existing best-of-breed tools. This role involves designing and scaling automated evaluation pipelines, integrating them into CI/CD workflows, and defining metrics that reflect both product goals and model behavior.