Identify and label languages and dialects from model-generated responses.
Review outputs from two different AI models and determine which model correctly identified the proposed language.
Compare model responses and select the appropriate evaluation outcome from predefined options
RWS – TrainAI is looking for Language Data Annotators. They embrace DEI and promotes equal opportunity and prohibits discrimination and harassment of any kind.
Coordinate project workstreams to ensure on-time delivery against execution standards.
Deliver high-quality annotation and QA to establish quality benchmarks.
Analyze datasets to surface patterns and optimization opportunities.
Appen specializes in human-generated data to train, fine-tune, and evaluate models across generative AI, large language models, computer vision, and speech recognition. They have over 1 million contributors in over 200 countries supporting model pre-training, supervised fine-tuning, evaluation and benchmarking, safety and red teaming, and multilingual global expansion.
Reviewing, annotating, and testing AI outputs for grammatical accuracy.
Acting as a primary quality check to proactively identify and correct subtle cultural errors.
Analyzing task quality trends and developing educational resources for AI task outputs.
They are sourcing independent Language Alignment & Resource Partners to provide native-level Arabic language vetting and QA for a specialized AI data project. As a contractor, you will supply your own equipment, and company-sponsored benefits do not apply.
Evaluate AI responses across scenarios like general Q&A and web search results.
Perform side-by-side comparisons of AI-generated responses, judging accuracy and clarity.
Apply detailed guidelines, maintaining consistency and high-quality evaluations.
Blueprint Technologies is a technology solutions firm that helps organizations grow, transform, and innovate. They have a strong presence across the United States and are expanding across Latin America, with teams united by a shared passion for solving complex problems.
Evaluate outputs based on accuracy, relevance, clarity, and instruction-following.
Perform side-by-side (SBS) comparisons of AI-generated responses.
Identify nuances in tone, meaning, and cultural context across French.
Blueprint Technologies is a technology solutions firm headquartered in Bellevue, Washington, with a strong presence across the United States and an expanding footprint across Latin America (LATAM). They are united by a shared passion for solving complex problems and bring diverse perspectives, deep expertise, and real-world experience across industries to help organizations grow, transform, and innovate.
Perform sampling and quality checks on annotated datasets to ensure adherence to annotation guidelines
Identify, log, and categorize annotation defects with severity levels, tracking corrective actions and rework tasks
Coordinate onboarding training, calibration sessions, and refresher training for annotators and reviewers
Welo Data is a multilingual data and evaluation partner for foundation labs and enterprises deploying GenAI systems globally, delivering human judgment, data infrastructure, and evaluation systems for reliable AI performance across languages and cultures. It operates with a global network of over 500,000 vetted experts across 300+ languages, leveraging a unified model led by specialized experts with proprietary identity and fraud-prevention frameworks to ensure accurate and culturally grounded datasets.
Perform side-by-side (SBS) comparisons of AI-generated responses.
Evaluate outputs based on accuracy, relevance, clarity, and instruction-following.
Apply detailed, scenario-specific annotation guidelines and maintain consistency and high-quality evaluations.
Blueprint Technologies is a technology solutions firm headquartered in Bellevue, Washington, with a strong presence across the United States and an expanding footprint across Latin America (LATAM). Our people bring diverse perspectives, deep expertise, and real-world experience across industries to help organizations grow, transform, and innovate.
Lead, train, and manage our in-house data labeling team.
Define, execute, and continuously improve data annotation processes with a very high attention to detail.
Ensure high-quality data outputs and meet rigorous accuracy and consistency standards.
Reducto provides a complete toolkit for handling any workflow by understanding documents the way a human would. They have raised over $100M and partner with hundreds of companies, from leading AI teams to enterprise costumers across FAANG and top trading firms.
Evaluate and improve model safety: Label, rank, audit, and refine human- and model-generated text to improve safety, quality, and policy alignment.
Apply nuanced safety judgment: Assess model outputs against detailed safety guidelines, rubrics, and style standards, making consistent decisions across ambiguous, sensitive, and context-dependent cases.
Create prompts and safety test cases: Write realistic prompts, user scenarios, and adversarial examples that help evaluate model behavior across safety categories and uncover unsafe, evasive, over-refusing, or policy-inconsistent responses.
Cohere's mission is to scale intelligence to serve humanity by training and deploying frontier models for developers and enterprises. They are a team of researchers, engineers, and designers passionate about their craft, believing that a diverse range of perspectives is a requirement for building great products.
Analyze, evaluate, and improve the quality of multilingual text classifier outputs.
Partner with linguistics and engineering teams to develop and refine language-specific parsers.
Translate and validate taxonomy content from English to Portuguese.
Lightcast is a global leader in labor market insights with headquarters in Moscow, ID. They work with partners across six continents to help drive economic prosperity and mobility.
Perform annotation and labeling tasks for generative AI datasets, including text, image, video, and multimodal content.
Create, review, and evaluate prompts and responses across a variety of domains and use cases.
Conduct quality assurance reviews to ensure annotation accuracy, consistency, and adherence to guidelines.
Welo Data delivers multilingual content transformation services in translation, localization, and adaptation for over 250 languages. They drive innovation in language services, delivering high-quality training data transformation solutions for NLP-enabled machine learning, with a network of over 400,000 in-country linguistic resources.
Steward the end-to-end planning, execution, and delivery of the dataset development program.
Act as the primary liaison between ML engineers, GIS engineers, technical artists, product stakeholders, and external annotation vendors.
Manage vendor relationships, including onboarding, quality assurance, throughput tracking, and contract compliance.
NBCUniversal is a leading media and entertainment company that creates world-class content across film, television, and streaming, with global theme park destinations, consumer products, and experiences. They own brands like NBC, NBC News, and Peacock and operate industry-leading theme parks worldwide. We champion an inclusive culture and strive to attract and develop a talented workforce.
Independently transcribe audio files in their native language
Compare and validate parallel transcripts to produce a final “ground truth” version
Perform entity tagging based on detailed client guidelines
Digital Divide Data (DDD) is a BPO that delivers ML data solutions and content services to Fortune 500 companies and the world’s leading academic institutions. DDD is unique in its ability to deliver end-to-end data creation, curation, labeling, and annotation services.