Jobs Similar to PhD Computer Science Expert for AI Training

Red Teaming | Generative AI Analyst

Welo Data 11 days ago

$47–$47/hr

US

Interact with generative AI models and project guidelines.
Create prompts to test model behavior across safety categories.
Document model breakability and effort level.

Welo Data provides AI services and specializes in data annotation. We foster a collaborative and innovative culture where employees contribute to cutting-edge AI safety evaluation.

View details Similar jobs

Red Teaming | Generative AI Analyst

Welo Data 11 days ago

$33–$33/hr

US

Interact with generative AI models using project-provided guidelines, safety taxonomies, and attack-vector guidance.
Create and evaluate prompts designed to test model behavior across safety-related categories.
Identify where model responses become unsafe, noncompliant, inconsistent, or otherwise problematic.

Welo Data is an AI services company that specializes in data annotation. They deliver multilingual content transformation services in translation, localization, and adaptation for over 250 languages with a growing network of over 400,000 in-country linguistic resources.

View details Similar jobs

Fellows Program — AI Safety

Anthropic 20 days ago

$3,850–$3,850/yr

US UK Canada

Fellows will use external infrastructure to work on an empirical project aligned with research priorities.
Projects aim to produce a public output, such as a paper submission.
Fellows receive mentorship and can access a shared workspace in Berkeley or London.

Anthropic's mission is to create reliable, interpretable, and steerable AI systems. Their team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

View details Similar jobs

Language Data Annotator (US)

RWS 17 days ago

$15–$15/hr

US

Identify and label languages and dialects from model-generated responses.
Review outputs from two different AI models and determine which model correctly identified the proposed language.
Compare model responses and select the appropriate evaluation outcome from predefined options

RWS – TrainAI is looking for Language Data Annotators. They embrace DEI and promotes equal opportunity and prohibits discrimination and harassment of any kind.

View details Similar jobs

Senior Content Engineer

Upsun 14 days ago

Australia Canada France Germany Spain US Unlimited PTO

Design and build Claude skills, MCP integrations, and automated pipelines that transform internal knowledge into publication-ready docs with minimal manual intervention.
Act as the final reviewer for content produced by AI-assisted workflows and engineers, maintaining a high bar for technical accuracy and polish.
Define content structures and metadata standards that ensure our documentation is agent-consumable and machine-parseable.

Upsun, formerly Platform.sh, is the cloud application platform humans and robots love. They give developers, DevOps engineers, and platform teams the ability to build, ship, and scale confidently without wrestling with backend infrastructure.

View details Similar jobs

Prompt Systems Engineer

Noodle 10 days ago

$115,000–$130,000/yr

US 4w PTO

Write, iterate, and maintain system prompts and instruction sets for Noodle’s AI agents across the student journey.
Build and maintain evaluation frameworks to measure agent accuracy, tone, hallucination rate, task completion, and alignment with rubric-based learning objectives.
Partner with Noodle teammates and university stakeholders to design, build, and test agents — translating learning objectives, operational flows, rubric assessments, and more into prompt-level agent instructions.

Noodle is higher education’s leading strategy, services, and technology partner that develops infrastructure, provides life-changing learning experiences, and grows the awareness of and the enrollment in some of the best academic institutions in the world. They empower universities to change the world by offering university partners various products and services.

View details Similar jobs

Senior Software Engineer

Jobgether 27 days ago

US

Evaluate AI-generated coding interactions end-to-end.
Assess quality of explanations and reasoning.
Provide constructive feedback on outputs.

Jobgether is a platform that uses AI to match candidates with jobs. They ensure applications are reviewed quickly, objectively, and fairly against the role's core requirements.

View details Similar jobs

Sr QA Engineer (AI Systems & Platform)

Peach Pilot 27 days ago

Latin America

Partner with full-stack and backend engineers on the features they are shipping, write tests that prove it works, and flag gaps early.
Help build and run evaluation pipelines for non-deterministic LLM outputs, prompt regression, model drift detection, and output quality scoring across the LiteLLM routing layer.
Test the Nango-based integration layer across connectors and the file ingestion pipeline including encryption, formatting edge cases, and audit trail continuity.

Peach Pilot transforms how businesses run with a platform that ingests everything about how a company operates and constructs a Company Brain. It is a funded early-stage AI startup headquartered in Atlanta, Georgia, with a working platform on live infrastructure.

View details Similar jobs

Finance Professionals - AI Training

Prolific 21 days ago

Global

Review and interpret financial reports, B2B data, or regulatory filings to verify information accuracy.
Respond to specific prompts based on financial data to help AI models understand technical terminology and complex fiscal concepts.
Ensure that the outputs generated by AI systems align with professional financial standards and logical economic frameworks.

Prolific is building the biggest pool of quality human data in the world. Over 35,000 AI developers, researchers, and organizations use Prolific to gather data from paid study participants with a wide variety of experiences, knowledge, and skills; they connect researchers and companies with a global pool of participants, enabling the collection of high-quality, ethically sourced human behavioural data and feedback.

View details Similar jobs

Labeler / Annotator – AI Response Evaluation (French)

Blueprint Technologies 26 days ago

$14–$16/hr

Evaluate outputs based on accuracy, relevance, clarity, and instruction-following.
Perform side-by-side (SBS) comparisons of AI-generated responses.
Identify nuances in tone, meaning, and cultural context across French.

Blueprint Technologies is a technology solutions firm headquartered in Bellevue, Washington, with a strong presence across the United States and an expanding footprint across Latin America (LATAM). They are united by a shared passion for solving complex problems and bring diverse perspectives, deep expertise, and real-world experience across industries to help organizations grow, transform, and innovate.

View details Similar jobs

Prompt Optimization

Welo Data 12 days ago

$16–$20/hr

Europe

Review, refine, and validate AI translation prompts for attraction and travel content.
Optimize AI-generated translations to ensure naturalness, fluency, and cultural relevance.
Test language prompts to ensure the output meets the required standards.

Welo Data provides AI services. They focus on helping businesses leverage the power of artificial intelligence to improve their operations and create innovative solutions.

View details Similar jobs

Creative Writing Generative AI Analyst

Welo Data 10 days ago

$38–$38/hr

US

Perform annotation and labeling tasks for generative AI datasets, including text, image, video, and multimodal content.
Create, review, and evaluate prompts and responses across a variety of domains and use cases.
Conduct quality assurance reviews to ensure annotation accuracy, consistency, and adherence to guidelines.

Welo Data delivers multilingual content transformation services in translation, localization, and adaptation for over 250 languages. They drive innovation in language services, delivering high-quality training data transformation solutions for NLP-enabled machine learning, with a network of over 400,000 in-country linguistic resources.

View details Similar jobs

Data Annotation Specialist

Cohere 11 days ago

$45–$45/hr

US Canada

Evaluate and improve model safety: Label, rank, audit, and refine human- and model-generated text to improve safety, quality, and policy alignment.
Apply nuanced safety judgment: Assess model outputs against detailed safety guidelines, rubrics, and style standards, making consistent decisions across ambiguous, sensitive, and context-dependent cases.
Create prompts and safety test cases: Write realistic prompts, user scenarios, and adversarial examples that help evaluate model behavior across safety categories and uncover unsafe, evasive, over-refusing, or policy-inconsistent responses.

Cohere's mission is to scale intelligence to serve humanity by training and deploying frontier models for developers and enterprises. They are a team of researchers, engineers, and designers passionate about their craft, believing that a diverse range of perspectives is a requirement for building great products.

View details Similar jobs

Prompt Optimization

Welo Data 12 days ago

$20–$20/hr

Europe

Review and refine AI translation prompts for attraction and travel content.
Optimize AI-generated translations to ensure naturalness and fluency.
Identify and flag issues that cannot be resolved through prompt tuning.

Welo Data provides AI services. It's a freelance-remote company that seems to value collaboration and contribution from its community members.

View details Similar jobs

Language Alignment & Resource Partner (Arabic) - Freelance AI Trainer

Project World Wide 22 days ago

$6–$65/hr

Global

Reviewing, annotating, and testing AI outputs for grammatical accuracy.
Acting as a primary quality check to proactively identify and correct subtle cultural errors.
Analyzing task quality trends and developing educational resources for AI task outputs.

They are sourcing independent Language Alignment & Resource Partners to provide native-level Arabic language vetting and QA for a specialized AI data project. As a contractor, you will supply your own equipment, and company-sponsored benefits do not apply.

View details Similar jobs

Civil Engineering Specialist

ABOUT THE ROLE 22 days ago

US Canada Mexico UK Spain

Design and evaluate high-difficulty Civil Engineering prompts that probe the limits of AI models
Identify reasoning errors, weaknesses, and failure modes in model outputs
Apply adversarial prompting techniques to surface gaps in model understanding

The company is developing the next generation of AI models. They value domain knowledge, attention to detail, and a passion for elevating data quality in AI systems.

View details Similar jobs

Prompt Creator

Welo Data 12 days ago

$28–$28/hr

US

Review, refine, and validate AI translation prompts for attraction and travel content.
Optimize AI-generated translations to meet standards of naturalness and fluency.
Proofread bilingual descriptions for accuracy and readability, refining prompts as needed.

Welo Data provides AI services. We have an exciting community and are looking for collaborators.

View details Similar jobs

Labeler / Annotator – AI Response Evaluation (Chinese: Simplified & Traditional)

Blueprint Technologies 26 days ago

$14–$16/hr

Global

Perform side-by-side (SBS) comparisons of AI-generated responses.
Evaluate outputs based on accuracy, relevance, clarity, and instruction-following.
Apply detailed, scenario-specific annotation guidelines and maintain consistency and high-quality evaluations.

Blueprint Technologies is a technology solutions firm headquartered in Bellevue, Washington, with a strong presence across the United States and an expanding footprint across Latin America (LATAM). Our people bring diverse perspectives, deep expertise, and real-world experience across industries to help organizations grow, transform, and innovate.

View details Similar jobs

Conversational Speech AI Evaluator; Australia (English)

Appen 10 days ago

$19–$19/hr

Australia

Engage in conversations with a real-time speech-to-speech AI model.
Evaluate performance based on speech recognition, audio quality, conversation flow, and content accuracy.
Provide accurate and consistent ratings based on project guidelines.

Appen is a company that focuses on improving real-time conversational AI. They leverage independent contractors and project-based opportunities to enhance multilingual voice interactions. They seem to foster a community-driven environment.

View details Similar jobs

Translation Validator | Urdu India

Welo Data 7 days ago

$3–$3/hr

Global

Review English source documents alongside two machine-generated Urdu translations.
Evaluate both variants based on accuracy, fluency, and overall translation quality.
Select the preferred translation and provide a clear written justification for your assessment.

Welo Data, part of Welocalize, is a global AI data company with 500,000+ contributors delivering high-quality, ethical data to train the world’s most advanced AI systems. They are building smarter, more human AI with a diverse community in 100+ countries.

View details Similar jobs

Source Job