Source Job

20 jobs similar to Safety Project — Election Neutrality

Jobs ranked by similarity.

US

  • Interact with generative AI models and project guidelines.
  • Create prompts to test model behavior across safety categories.
  • Document model breakability and effort level.

Welo Data provides AI services and specializes in data annotation. We foster a collaborative and innovative culture where employees contribute to cutting-edge AI safety evaluation.

US

  • Interact with generative AI models using project-provided guidelines, safety taxonomies, and attack-vector guidance.
  • Create and evaluate prompts designed to test model behavior across safety-related categories.
  • Identify where model responses become unsafe, noncompliant, inconsistent, or otherwise problematic.

Welo Data is an AI services company that specializes in data annotation. They deliver multilingual content transformation services in translation, localization, and adaptation for over 250 languages with a growing network of over 400,000 in-country linguistic resources.

Global

  • Develop and operate a system that combines ontologies, knowledge graphs, defeasible argumentation frameworks, and LLM-assisted population pipelines.
  • Implement defeasible argumentation frameworks that capture both logical structure and vulnerability to rebuttal.
  • Architect agent coordination patterns for multi-step research and population tasks, with robust error handling and graceful degradation.

CARMA works to help society navigate the complex and potentially catastrophic risks arising from increasingly powerful AI systems. They are a fiscally-sponsored project of Social & Environmental Entrepreneurs, Inc., a 501(c)(3) nonprofit public benefit corporation with a mission to lower the risks to humanity and the biosphere from transformative AI.

Australia

  • Engage in conversations with a real-time speech-to-speech AI model.
  • Evaluate performance based on speech recognition, audio quality, conversation flow, and content accuracy.
  • Provide accurate and consistent ratings based on project guidelines.

Appen is a company that focuses on improving real-time conversational AI. They leverage independent contractors and project-based opportunities to enhance multilingual voice interactions. They seem to foster a community-driven environment.

$15–$15/hr
US

  • Identify and label languages and dialects from model-generated responses.
  • Review outputs from two different AI models and determine which model correctly identified the proposed language.
  • Compare model responses and select the appropriate evaluation outcome from predefined options

RWS – TrainAI is looking for Language Data Annotators. They embrace DEI and promotes equal opportunity and prohibits discrimination and harassment of any kind.

Global

  • Contribute to AI training through annotation, evaluation, and prompt creation from anywhere.
  • Use your native Serbian fluency to help build more accurate and inclusive AI.
  • Join a global network of contributors with flexible projects that fit your schedule.

Welo Data, part of Welocalize, is a global AI data company with 500,000+ contributors delivering high-quality, ethical data to train advanced AI systems. They offer flexible, remote opportunities for a diverse community in 100+ countries, building smarter, more human AI.

Europe US 4w PTO

  • Continuously explore emerging shifts in AI interfaces, orchestration, agents, and autonomy through hands-on experimentation and ecosystem research.
  • Rapidly prototype, validate, and launch new AI-native product ideas with minimal support and high autonomy.
  • Use structured thinking, research, and experimentation to evaluate what n8n should invest in over the next 1–3 years.

N8n is the open workflow orchestration platform built for the new era of AI. They give technical teams the freedom of code with the speed of no-code, so they can automate faster, smarter, and without limits. Since their founding in 2019, they’ve grown into a diverse team of over 260 working across Europe and the US.

$45–$45/hr
US Canada

  • Evaluate and improve model safety: Label, rank, audit, and refine human- and model-generated text to improve safety, quality, and policy alignment.
  • Apply nuanced safety judgment: Assess model outputs against detailed safety guidelines, rubrics, and style standards, making consistent decisions across ambiguous, sensitive, and context-dependent cases.
  • Create prompts and safety test cases: Write realistic prompts, user scenarios, and adversarial examples that help evaluate model behavior across safety categories and uncover unsafe, evasive, over-refusing, or policy-inconsistent responses.

Cohere's mission is to scale intelligence to serve humanity by training and deploying frontier models for developers and enterprises. They are a team of researchers, engineers, and designers passionate about their craft, believing that a diverse range of perspectives is a requirement for building great products.

Mexico

  • Review contributor evaluations of model-generated responses for accuracy and adherence to guidelines.
  • Verify that contributors apply rubric dimensions and identify factual errors or missing information.
  • Provide detailed feedback to improve contributor performance and ensure high-quality evaluations.

Welo Data is an AI services company specializing in data validation. They operate as a freelancer platform with remote opportunities.

  • Audit and evaluate chatbot conversations based on core dimensions.
  • Follow project-specific guidelines for accurate evaluations.
  • Use a proprietary client platform to complete tasks.

RWS Group provides technology-enabled language, content management and intellectual property services. They embrace DEI and promote equal opportunity; they are an Equal Opportunity Employer and prohibit discrimination and harassment of any kind.

$115,000–$130,000/yr
US 4w PTO

  • Write, iterate, and maintain system prompts and instruction sets for Noodle’s AI agents across the student journey.
  • Build and maintain evaluation frameworks to measure agent accuracy, tone, hallucination rate, task completion, and alignment with rubric-based learning objectives.
  • Partner with Noodle teammates and university stakeholders to design, build, and test agents — translating learning objectives, operational flows, rubric assessments, and more into prompt-level agent instructions.

Noodle is higher education’s leading strategy, services, and technology partner that develops infrastructure, provides life-changing learning experiences, and grows the awareness of and the enrollment in some of the best academic institutions in the world. They empower universities to change the world by offering university partners various products and services.

US

  • Rate the performance of AI models or algorithms based on their output or behavior.
  • Label elements of content, assign categories, and evaluate quality or appropriateness.
  • Generate additional training data by transforming original data like text, images, or audio.

Innodata is a global data engineering company enabling responsible AI advancement. With over 36 years of experience, we deliver high-quality data and services to AI builders and adopters.

Asia

  • Engage in natural conversations with two AI models for 2-6 turns using provided scenarios.
  • Compare and rank the models based on given criteria after completing conversations.
  • Submit Pass/Partial/Fail votes for each model performance.

An enterprise client helps the world's most innovative companies improve their AI models by providing human feedback. They work with a large volume of freelancers on a contract basis and emphasize a culture of flexibility and remote collaboration.

$16–$20/hr
Europe

  • Review, refine, and validate AI translation prompts for attraction and travel content.
  • Optimize AI-generated translations to ensure naturalness, fluency, and cultural relevance.
  • Test language prompts to ensure the output meets the required standards.

Welo Data provides AI services. They focus on helping businesses leverage the power of artificial intelligence to improve their operations and create innovative solutions.

US

  • Perform annotation and labeling tasks for generative AI datasets, including text, image, video, and multimodal content.
  • Create, review, and evaluate prompts and responses across a variety of domains and use cases.
  • Conduct quality assurance reviews to ensure annotation accuracy, consistency, and adherence to guidelines.

Welo Data delivers multilingual content transformation services in translation, localization, and adaptation for over 250 languages. They drive innovation in language services, delivering high-quality training data transformation solutions for NLP-enabled machine learning, with a network of over 400,000 in-country linguistic resources.

Asia

  • Engage in natural conversations with two AI models and evaluate their performances.
  • Compare and rank models based on provided criteria after each dialogue.
  • Submit pass/fail votes for each model to help improve AI quality.

An enterprise client helps innovative companies improve their AI models through human feedback. They are seeking a high volume of freelancers for conversational AI training tasks.

Global

  • Contribute to training smarter, more inclusive AI through flexible, remote projects.
  • Work on annotation, evaluation, and prompt creation tasks tailored to your skills.
  • Join a global community of linguists and culturally aware contributors shaping safer AI.

Welo Data, part of Welocalize, is a global AI data company with over 500,000 contributors delivering high-quality, ethical data to train advanced AI systems. We are building a diverse community in 100+ countries, offering limitless opportunities for growth and contribution on your own terms.

Global

  • Evaluate Korean-English translations produced by an AI chatbot for accuracy and adherence to user-specified requirements.
  • Apply MQM error annotations, verify auto-generated rubric items, and evaluate each item as Pass or Fail.
  • Ensure consistent and accurate ratings while completing assigned tasks within given timelines.

Appen helps improve AI-powered translation systems through crowd-based evaluation projects. They are a large global company offering project-based independent contractor roles, fostering a culture of flexibility and remote work.

US Canada Mexico Australia New Zealand Argentina

  • Assess the factual accuracy, relevance, and quality of AI-generated Computer Science content
  • Craft and answer domain-specific questions related to Computer Science and adjacent technical disciplines
  • Evaluate and rank AI-generated responses based on technical correctness and reasoning quality

The company is seeking Computer Science Experts with PhDs to support the training and evaluation of advanced AI models. This initiative focuses on improving the accuracy, reasoning, and domain expertise of generative AI systems through expert human feedback.

Europe

  • Review pre-written prompt instructions for tone, grammar, proper name handling, and measurements.
  • Add language-specific grammar or stylistic notes to enhance prompt accuracy.
  • Translate product-specific terms and cross-check against approved glossaries.

Welo Data provides AI services. They focus on AI service general application.