Source Job

  • Review AI-generated responses and evaluate technical accuracy.
  • Provide expert feedback to train AI systems to write better code.
  • Work with various programming languages and coding challenges.

Python LLMs RLHF C# Java

20 jobs similar to AI Technical Evaluator

Jobs ranked by similarity.

  • Leverage professional experience to evaluate AI models' output in your field.
  • Assess content and deliver feedback to strengthen the model’s understanding.
  • Work independently from anywhere, with flexible hours and no minimum commitment.

Handshake is a recruiting platform. They connect students and recent graduates with employers.

Europe

  • Contribute to building smarter, more inclusive AI systems.
  • Work on annotation, evaluation, and prompt creation projects.
  • Join a global network of linguists and language enthusiasts.

Welo Data, part of Welocalize, is a global AI data company with 500,000+ contributors delivering high-quality, ethical data to train the world’s most advanced AI systems.

US

Review AI-generated written content across multiple genres and formats, providing feedback. Use your expertise to help AI reason through writing challenges, including argumentation, structure, tone, and audience engagement. Identify biases, inaccuracies, or unclear passages in AI-generated outputs, and develop tests.

Labelbox builds the data engine that accelerates breakthrough AI, enabling safer, smarter models in production and is trusted by leading research labs and enterprises worldwide.

World Wide

  • Challenge advanced language models on realistic infrastructure and platform scenarios.
  • Verify architectural soundness and logical correctness, assess code quality and testing strategies.
  • Analyze performance bottlenecks and deployment risks, capture reproducible failure cases, and suggest improvements.

The company is hiring for a SWE Infrastructure Specialist. As a contractor, the employee will need to supply a secure computer and high-speed internet; company-sponsored benefits such as health insurance and PTO do not apply.

$85,000–$225,000/yr
US Canada

This role validates Veeva AI Agents through evaluation. You will define strategies for new AI Agents. The role involves analysis of model behaviors to identify defects.

Veeva Systems is a mission-driven organization and pioneer in industry cloud, helping life sciences companies bring therapies to patients faster.

$28–$28/hr
Europe

  • Review and evaluate AI-generated content to ensure accuracy, clarity, and proper source attribution.
  • Utilize linguistic expertise to create data and then evaluate the resulting AI-generated content.
  • Adhere strictly to detailed annotation and fact-checking guidelines provided in English.

RWS embraces DEI and promotes equal opportunity, we are an Equal Opportunity Employer and prohibit discrimination and harassment of any kind.

$80,000–$150,000/yr

  • Research, Document, Test, and Ideate: Explore the best ways to achieve our customers’ goals using LLMs and other AI tools.
  • Master Our Dialogue Platform: Become an expert, answer questions, and train others on prompting both within and outside of our platform.
  • Train Our AIs: Utilize prompting, knowledge-base creation, and fine-tuning to enhance our AI capabilities.

1mind is a platform that deploys multimodal Superhumans for revenue teams, combining a face, a voice, and a GTM brain. The company has a remote-first, fast-moving culture with ownership, autonomy, and impact from day one.

$150,000–$220,000/yr
US Unlimited PTO

  • Incorporating the best research work on agents and code generation into the OpenHands framework
  • Performing novel improvements in areas of interest to improve agent performance and efficiency
  • Running and implementing evaluations to ensure agent quality

OpenHands is building an open-source AI platform that empowers engineering teams to accelerate development, automate workflows, and integrate intelligent coding assistance into real-world software delivery. The company fosters a culture built on kindness, candor, autonomy, and learning.

$115,747–$208,344/yr
US

Design, implement, and refine prompts and conversational flows for agentic automation, ensuring high-quality consumer self-service experiences. Develop innovative, scalable components with JavaScript in a specialized scripting framework. Leverage Generative AI coding assistants to accelerate development, improve code quality, and enhance productivity throughout the software lifecycle.

Experian is a global data and technology company, powering opportunities for people and businesses around the world.

US

  • Fix bugs without writing code or waiting for engineering.
  • Build AI agents that handle entire support scenarios end-to-end.
  • Consult with users on achieving real outcomes, building infrastructure that scales, and collaborating with product and engineering.

We're building an AI‑native workspace—an operating system for work that puts co‑intelligence at the center.

US Canada

-Review and evaluate AI-generated written responses. -Refine and rewrite responses to improve clarity, tone, and educational quality. -Create natural prompts and example dialogues to support training data needs.

Welo Data is a global AI data company with 500,000+ contributors delivering high-quality, ethical data to train the world’s most advanced AI systems.

$30–$35/hr
Global

  • Evaluate AI-generated Japanese speech and text for linguistic accuracy, naturalness, and educational quality.
  • Assess learner speech and writing across proficiency levels from CEFR Pre-A1 through B2+.
  • Apply expert judgment to identify learner errors, unnatural phrasing, and pedagogical gaps.

Alignerr partners with leading AI labs to build expert-driven data pipelines that improve how models reason, learn, and communicate. They work with domain specialists around the world to evaluate and refine AI systems in areas where precision, pedagogy, and human judgment matter most.

Europe 6w PTO

  • Design multi-step AI prompt chains to generate high-quality educational content.
  • Orchestrate and debug multi-step AI flows, managing the technical tooling.
  • Build and maintain automated AI workflows using platforms such as n8n.

Kognity is a 125-person EdTech scale-up powering learning in 120+ countries through its intelligent platform that combines rich pedagogy with smart AI.

  • Evaluate AI model outputs related to the instructional field.
  • Develop prompts for AI models reflecting field expertise.
  • Provide clear, structured feedback to enhance AI understanding.

Handshake is recruiting Instructional Coordinator Professionals to contribute to an hourly, temporary AI research project—but there’s no AI experience needed.

Europe

Train and scale neural networks for processing source code. Develop new methods and improve existing ones for code generation, code editing, and agent-based workflows. Mentor colleagues on ML topics.

At JetBrains, code is our passion and since 2000, they’ve focused on reducing routine work so developers can spend more time building and shipping.

Japan

  • Participate in round-table style discussions about AI tools, including capabilities, weaknesses, cultural alignment, prompt behavior, and model differences.
  • Share real examples of how you use AI - coding, writing, document creation, design support, idea generation, manga/comic development, translation, etc.
  • Evaluate model outputs and provide detailed feedback on issues such as: overly formal or informal tone, incorrect cultural references or mismatched context.

With 27+ years of experience, Welo Data stands as a global leader in high-quality datasets and AI services.

  • Build AI agents and tools that transform how developers write code and debug issues.
  • Architect and implement AI-powered tools such as code review assistants and automated test generators.
  • Collaborate with the Principal Engineer and product/design teams in a remote-first environment.

Docker makes app development easier so developers can focus on what matters.