Source Job

$13–$15/hr
US Canada Europe Australia New Zealand

  • Create and answer questions to train AI models.
  • Review, analyze, and rank AI-models' chains of thought for correctness and approach.
  • Provide clear, constructive feedback to improve AI-generated responses.

Customer Support Quality Assurance Analytics

20 jobs similar to AI Training Contractor

Jobs ranked by similarity.

US

  • You'll work with AI tools, test model outputs, and evaluate responses.
  • Document errors, gaps, and collaborate with our team.
  • Spot inconsistencies and provide structured feedback.

Project World Wide is involved in shaping the future of AI through training data. They seek motivated individuals to contribute to the development of cutting-edge AI systems.

Europe

  • Review and label content for sentiment, factual accuracy, and reasoning issues.
  • Evaluate model outputs across quality dimensions using scoring frameworks.
  • Validate automated assessments and identify discrepancies or errors.

Welo Data provides AI services helping to develop and evaluate large language models (LLMs). The job posting does not provide information regarding the company's size and culture.

Global

  • Challenge AI models on realistic educational scenarios.
  • Validate whether its understanding of pedagogical concepts reflects best-in-class teaching practice.
  • Evaluate AI outputs for clarity and correctness, analyze subtle reasoning errors, document gaps in logic.

The company is seeking independent Instructional Experts with hands-on experience teaching, tutoring, or building curriculum to train AI models. As a contractor you’ll supply a secure computer and high-speed internet; company-sponsored benefits such as health insurance and PTO do not apply.

US

  • Evaluate AI-generated responses for accuracy, grammar, and cultural relevance.
  • Identify issues and provide refined, high-quality rewritten responses.
  • Create natural prompts and responses in English to improve conversational datasets.

Welo Data, part of Welocalize, is a global AI data company with 500,000+ contributors delivering high-quality, ethical data to train the world’s most advanced AI systems. They build smarter, more human AI with a diverse community in 100+ countries.

$177,000–$250,300/yr
US

  • Own Agent retrieval accuracy and relevance.
  • Drive automated resolution rates.
  • Manage AI safety and trust.

Airtable is the no-code app platform that empowers people closest to the work to accelerate their most critical business processes. More than 500,000 organizations, including 80% of the Fortune 100, rely on Airtable to transform how work gets done.

Global

  • Engage the model with investment scenarios, analytical questions, and market-based reasoning tasks; verify factual correctness and financial logic.
  • Assess the validity of investment reasoning; capture reproducible error traces; and provide structured feedback to improve prompts, evaluation frameworks, and analytical depth.
  • Identify where models oversimplify market behavior or misinterpret financial data.

They are evolving large-scale language models from simple conversational tools into systems capable of analyzing financial markets, interpreting investment strategies, and supporting decision-making across asset classes. They seem to have a growing team.

Global

  • Use financial analysis, modeling, and advisory experience to evaluate AI content.
  • Provide feedback to help AI understand financial concepts.
  • Work independently on a flexible schedule with no minimum hour requirement.

Handshake is connecting students and employers. Through Handshake, finance professionals help AI to better understand financial concepts, quantitative reasoning, industry terminology, and professional communication.

$40–$100/hr
Global

  • Migrate and test existing bulk flashcard creation prompts.
  • Run test suites and manually review AI outputs for quality and correctness.
  • Analyze real user data to identify failure patterns and improve prompts.

Brainscape is the world's leading web & mobile EdTech study platform. They help millions of learners create better flashcards and the company is looking for an AI Prompt Engineer to join their team.

Australia New Zealand

  • Building a truly flexible and scalable conversational AI platform.
  • Fine-tuning and evaluating LLM-based models to improve performance.
  • Contributing to platform engineering across both ML and backend systems.

Canva is a design platform that allows users to create social media graphics, presentations, posters, documents and other visual content. They have a campus in Sydney, and a second campus in Melbourne and co-working spaces in Brisbane, Perth, Adelaide, and Auckland, NZ.

Global

  • Email inbox management
  • Assist in the recruitment process of new employees
  • Administrative tasks

We are seeking a reliable and detail-oriented admin assistant to support us with general administrative tasks on a full-time basis. For the right candidate, the position offers long-term potential.

$90,000–$110,000/yr
US

  • Design and refine prompts that power AI features.
  • Test AI outputs for accuracy and customer value.
  • Identify and resolve issues with AI behavior.

CentralReach is a leading provider of autism and IDD care software for Applied Behavior Analysis (ABA), multidisciplinary therapy, and special education. With over 200,000 users and backed by Roper Technologies, Inc., they are entering an exciting phase of growth and innovation.

Global

  • Evaluate AI-generated content using your biological training.
  • Provide feedback to help AI better understand biological reasoning.
  • Work on a flexible, asynchronous schedule with no minimum hour requirement.

Handshake AI utilizes AI technology. They value expertise in biological reasoning, experimental design, data interpretation, and scientific problem-solving.

US

  • Review contributor evaluations of model-generated responses to ensure adherence to project-specific guidelines.
  • Verify that contributors consistently apply all instructions and evaluation criteria when assessing model responses.
  • Confirm that contributors accurately identify factual errors, hallucinations, or missing information in model responses.

Welo Data, part of Welocalize, is a global AI data company delivering high-quality, ethical data to train the world’s most advanced AI systems. Welo Data has a diverse community in 100+ countries building smarter, more human AI, offering limitless opportunities for the global community to grow and contribute.

Europe

  • You will be matched with another participant for 1-on-1 verbal or text-based exchanges.
  • Use your natural Dutch from Netherlands dialect to discuss various topics provided by the researcher.
  • Help the AI understand the nuances, slang, and cultural context of Dutch from the Netherlands, through real-world interaction.

Prolific is building the biggest pool of quality human data in the world. Over 35,000 AI developers, researchers, and organizations use Prolific to gather data from paid study participants with a wide variety of experiences, knowledge, and skills.

Global

  • Challenge advanced language models on topics like verb conjugation and word order.
  • Verify factual accuracy and logical soundness, capturing reproducible error traces.
  • Suggest improvements to prompt engineering and evaluation metrics.

I am unable to extract the company description from this job posting, because Greenhouse is a recruiting platform, and the posting company is not clearly named.

Global

  • Challenge advanced language models on business topics.
  • Evaluate logical consistency and real-world applicability.
  • Provide structured feedback to improve prompts and reasoning quality.

They are shaping the future of AI. They are looking for individuals with familiarity in areas such as business operations, management principles, organizational behavior, project management, marketing, entrepreneurship, supply chain fundamentals, and strategic planning.

Global

  • Participate in 15–60 minute recorded conversations.
  • Collaborate with the Data Operations team.
  • Contribute to high-quality conversational datasets.

Neon collaborates with prominent AI labs and tech companies to create premium conversational voice datasets, fostering advancements in speech and conversational AI. They seem to be a smaller company focusing on specialized data solutions.

US

  • Design and curate evaluation datasets for retrieval quality.
  • Measure retrieval quality using metrics like Recall@k, Precision@k, MRR, and NDCG@k.
  • Conduct systematic error analysis on AI/ML system outputs; build structured failure taxonomies.

Jump empowers financial advisors, firms, and clients to thrive in the age of AI by automating tasks like meeting prep and compliance. As a Series A company, Jump has raised $30M and grown to 100+ employees including leaders from top companies and schools, fostering a culture of velocity, world-class standards, direct communication, and kindness.

Global

  • Create and execute role-play–based evaluation scenarios that simulate realistic customer service interactions.
  • Contribute to the development of diverse and representative datasets used to assess conversational audio agents.
  • Evaluate model performance across a standardized set of qualitative and quantitative metrics.

They are dedicated to assessing and benchmarking advanced agentic audio models against leading systems. The program’s mission is to evaluate and optimize model performance for real-world customer support use cases.

Global

  • Make scripted and unscripted calls with an AI agent.
  • Produce clear, natural speech following provided guidelines.
  • Test and validate the AI’s ability to understand and interpret speech.

RWS is committed to the principle of equal employment opportunity for all employees and to providing employees with a work environment free of discrimination and harassment. All employment decisions at RWS are based on business needs, job requirements and individual qualifications.