Job Description

We’re looking for an AI/ML Evaluation Engineer to drive the accuracy, reliability, and performance of next-generation AI systems. You’ll build evaluation pipelines, metrics, datasets, and automation that ensure model outputs are consistent, safe, and aligned with real-world expectations. This role is fully technical and highly collaborative, working closely with AI engineers, QA, data scientists, and product leaders.

Responsibilities include writing Python and SQL scripts to evaluate outputs from large language models (LLMs), designing and implementing LLM-as-Judge evaluations, defining and calculating metrics, building and maintaining ground-truth datasets, automating evaluation workflows, analyzing large unstructured datasets, diagnosing failure modes, producing clear reports, collaborating with cross-functional teams, documenting processes, and maintaining reproducibility.

About Truelogic

Truelogic is a leading provider of nearshore staff augmentation services headquartered in New York, delivering top-tier technology solutions to companies of all sizes.

Apply for This Position