Source Job

US Canada Mexico Australia New Zealand Argentina

  • Assess the factual accuracy, relevance, and quality of AI-generated Computer Science content
  • Craft and answer domain-specific questions related to Computer Science and adjacent technical disciplines
  • Evaluate and rank AI-generated responses based on technical correctness and reasoning quality

AI Machine Learning Cybersecurity Distributed Systems

20 jobs similar to PhD Computer Science Expert for AI Training

Jobs ranked by similarity.

US

  • Interact with generative AI models and project guidelines.
  • Create prompts to test model behavior across safety categories.
  • Document model breakability and effort level.

Welo Data provides AI services and specializes in data annotation. We foster a collaborative and innovative culture where employees contribute to cutting-edge AI safety evaluation.

US

  • Interact with generative AI models using project-provided guidelines, safety taxonomies, and attack-vector guidance.
  • Create and evaluate prompts designed to test model behavior across safety-related categories.
  • Identify where model responses become unsafe, noncompliant, inconsistent, or otherwise problematic.

Welo Data is an AI services company that specializes in data annotation. They deliver multilingual content transformation services in translation, localization, and adaptation for over 250 languages with a growing network of over 400,000 in-country linguistic resources.

$3,850–$3,850/yr
US UK Canada

  • Fellows will use external infrastructure to work on an empirical project aligned with research priorities.
  • Projects aim to produce a public output, such as a paper submission.
  • Fellows receive mentorship and can access a shared workspace in Berkeley or London.

Anthropic's mission is to create reliable, interpretable, and steerable AI systems. Their team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

$15–$15/hr
US

  • Identify and label languages and dialects from model-generated responses.
  • Review outputs from two different AI models and determine which model correctly identified the proposed language.
  • Compare model responses and select the appropriate evaluation outcome from predefined options

RWS – TrainAI is looking for Language Data Annotators. They embrace DEI and promotes equal opportunity and prohibits discrimination and harassment of any kind.

Australia Canada France Germany Spain US Unlimited PTO

  • Design and build Claude skills, MCP integrations, and automated pipelines that transform internal knowledge into publication-ready docs with minimal manual intervention.
  • Act as the final reviewer for content produced by AI-assisted workflows and engineers, maintaining a high bar for technical accuracy and polish.
  • Define content structures and metadata standards that ensure our documentation is agent-consumable and machine-parseable.

Upsun, formerly Platform.sh, is the cloud application platform humans and robots love. They give developers, DevOps engineers, and platform teams the ability to build, ship, and scale confidently without wrestling with backend infrastructure.

$115,000–$130,000/yr
US 4w PTO

  • Write, iterate, and maintain system prompts and instruction sets for Noodle’s AI agents across the student journey.
  • Build and maintain evaluation frameworks to measure agent accuracy, tone, hallucination rate, task completion, and alignment with rubric-based learning objectives.
  • Partner with Noodle teammates and university stakeholders to design, build, and test agents — translating learning objectives, operational flows, rubric assessments, and more into prompt-level agent instructions.

Noodle is higher education’s leading strategy, services, and technology partner that develops infrastructure, provides life-changing learning experiences, and grows the awareness of and the enrollment in some of the best academic institutions in the world. They empower universities to change the world by offering university partners various products and services.

US

  • Evaluate AI-generated coding interactions end-to-end.
  • Assess quality of explanations and reasoning.
  • Provide constructive feedback on outputs.

Jobgether is a platform that uses AI to match candidates with jobs. They ensure applications are reviewed quickly, objectively, and fairly against the role's core requirements.

Latin America

  • Partner with full-stack and backend engineers on the features they are shipping, write tests that prove it works, and flag gaps early.
  • Help build and run evaluation pipelines for non-deterministic LLM outputs, prompt regression, model drift detection, and output quality scoring across the LiteLLM routing layer.
  • Test the Nango-based integration layer across connectors and the file ingestion pipeline including encryption, formatting edge cases, and audit trail continuity.

Peach Pilot transforms how businesses run with a platform that ingests everything about how a company operates and constructs a Company Brain. It is a funded early-stage AI startup headquartered in Atlanta, Georgia, with a working platform on live infrastructure.

Global

  • Review and interpret financial reports, B2B data, or regulatory filings to verify information accuracy.
  • Respond to specific prompts based on financial data to help AI models understand technical terminology and complex fiscal concepts.
  • Ensure that the outputs generated by AI systems align with professional financial standards and logical economic frameworks.

Prolific is building the biggest pool of quality human data in the world. Over 35,000 AI developers, researchers, and organizations use Prolific to gather data from paid study participants with a wide variety of experiences, knowledge, and skills; they connect researchers and companies with a global pool of participants, enabling the collection of high-quality, ethically sourced human behavioural data and feedback.

  • Evaluate outputs based on accuracy, relevance, clarity, and instruction-following.
  • Perform side-by-side (SBS) comparisons of AI-generated responses.
  • Identify nuances in tone, meaning, and cultural context across French.

Blueprint Technologies is a technology solutions firm headquartered in Bellevue, Washington, with a strong presence across the United States and an expanding footprint across Latin America (LATAM). They are united by a shared passion for solving complex problems and bring diverse perspectives, deep expertise, and real-world experience across industries to help organizations grow, transform, and innovate.

$16–$20/hr
Europe

  • Review, refine, and validate AI translation prompts for attraction and travel content.
  • Optimize AI-generated translations to ensure naturalness, fluency, and cultural relevance.
  • Test language prompts to ensure the output meets the required standards.

Welo Data provides AI services. They focus on helping businesses leverage the power of artificial intelligence to improve their operations and create innovative solutions.

US

  • Perform annotation and labeling tasks for generative AI datasets, including text, image, video, and multimodal content.
  • Create, review, and evaluate prompts and responses across a variety of domains and use cases.
  • Conduct quality assurance reviews to ensure annotation accuracy, consistency, and adherence to guidelines.

Welo Data delivers multilingual content transformation services in translation, localization, and adaptation for over 250 languages. They drive innovation in language services, delivering high-quality training data transformation solutions for NLP-enabled machine learning, with a network of over 400,000 in-country linguistic resources.

$45–$45/hr
US Canada

  • Evaluate and improve model safety: Label, rank, audit, and refine human- and model-generated text to improve safety, quality, and policy alignment.
  • Apply nuanced safety judgment: Assess model outputs against detailed safety guidelines, rubrics, and style standards, making consistent decisions across ambiguous, sensitive, and context-dependent cases.
  • Create prompts and safety test cases: Write realistic prompts, user scenarios, and adversarial examples that help evaluate model behavior across safety categories and uncover unsafe, evasive, over-refusing, or policy-inconsistent responses.

Cohere's mission is to scale intelligence to serve humanity by training and deploying frontier models for developers and enterprises. They are a team of researchers, engineers, and designers passionate about their craft, believing that a diverse range of perspectives is a requirement for building great products.

$20–$20/hr
Europe

  • Review and refine AI translation prompts for attraction and travel content.
  • Optimize AI-generated translations to ensure naturalness and fluency.
  • Identify and flag issues that cannot be resolved through prompt tuning.

Welo Data provides AI services. It's a freelance-remote company that seems to value collaboration and contribution from its community members.

Global

  • Reviewing, annotating, and testing AI outputs for grammatical accuracy.
  • Acting as a primary quality check to proactively identify and correct subtle cultural errors.
  • Analyzing task quality trends and developing educational resources for AI task outputs.

They are sourcing independent Language Alignment & Resource Partners to provide native-level Arabic language vetting and QA for a specialized AI data project. As a contractor, you will supply your own equipment, and company-sponsored benefits do not apply.

US Canada Mexico UK Spain

  • Design and evaluate high-difficulty Civil Engineering prompts that probe the limits of AI models
  • Identify reasoning errors, weaknesses, and failure modes in model outputs
  • Apply adversarial prompting techniques to surface gaps in model understanding

The company is developing the next generation of AI models. They value domain knowledge, attention to detail, and a passion for elevating data quality in AI systems.

$28–$28/hr
US

  • Review, refine, and validate AI translation prompts for attraction and travel content.
  • Optimize AI-generated translations to meet standards of naturalness and fluency.
  • Proofread bilingual descriptions for accuracy and readability, refining prompts as needed.

Welo Data provides AI services. We have an exciting community and are looking for collaborators.

Global

  • Perform side-by-side (SBS) comparisons of AI-generated responses.
  • Evaluate outputs based on accuracy, relevance, clarity, and instruction-following.
  • Apply detailed, scenario-specific annotation guidelines and maintain consistency and high-quality evaluations.

Blueprint Technologies is a technology solutions firm headquartered in Bellevue, Washington, with a strong presence across the United States and an expanding footprint across Latin America (LATAM). Our people bring diverse perspectives, deep expertise, and real-world experience across industries to help organizations grow, transform, and innovate.

Australia

  • Engage in conversations with a real-time speech-to-speech AI model.
  • Evaluate performance based on speech recognition, audio quality, conversation flow, and content accuracy.
  • Provide accurate and consistent ratings based on project guidelines.

Appen is a company that focuses on improving real-time conversational AI. They leverage independent contractors and project-based opportunities to enhance multilingual voice interactions. They seem to foster a community-driven environment.

Global

  • Review English source documents alongside two machine-generated Urdu translations.
  • Evaluate both variants based on accuracy, fluency, and overall translation quality.
  • Select the preferred translation and provide a clear written justification for your assessment.

Welo Data, part of Welocalize, is a global AI data company with 500,000+ contributors delivering high-quality, ethical data to train the world’s most advanced AI systems. They are building smarter, more human AI with a diverse community in 100+ countries.