Design and develop coding benchmarks to evaluate frontier AI models.
Analyze AI-generated code for correctness, reliability, efficiency, and edge cases.
Build and maintain scalable data pipelines that support AI evaluation workflows.

Qualifications:

4+ years of professional software engineering experience with expert-level Python.
Experience with CI/CD pipelines and automated testing frameworks.
Strong understanding of software engineering best practices and code quality.

Work Environment:

Fully remote contract opportunity with compensation of $80–$100 USD per hour.
Expected workload is 10–39 hours per week, with weekly payments.
Hiring process includes a qualification form, evaluation, and technical interview.

Enterprise Client

An enterprise client is a leading AI platform that enables organizations to build intelligent applications through high-quality human feedback, AI evaluation, and model alignment. The selected consultants will work on improving frontier AI models, though company size and culture details are not specified.

Apply for This Position

Senior Python Developer (AI Evaluation & Benchmarking)

Similar Jobs

Python Coding Specialist - Freelance AI Trainer

Data Annotation Specialist

Staff Software Engineer, AI Foundations

Software Engineer

Junior Full-Stack Engineer (AI Code Evaluation)

Enterprise Client