- Design scenario-based and edge-case prompts to test AI behavior.
- Develop evaluation rubrics to assess AI responses across multiple criteria.
- Perform side-by-side evaluations of AI outputs and score them using defined criteria.
Jobs ranked by similarity.
CrowdGen, by Appen, focuses on AI response evaluation. They are looking for native Javanese speakers to contribute to a multilingual AI response evaluation project where you review large language model outputs.
Welo Data, part of Welocalize, is a global AI data company with 500,000+ contributors delivering high-quality, ethical data to train the world’s most advanced AI systems. They're building smarter, more human AI with a diverse community in 100+ countries.
Alignerr partners with leading AI labs to build expert-driven data pipelines. They improve how models reason, learn, and communicate by working with domain specialists to evaluate and refine AI systems where precision, pedagogy, and human judgment matter most.
Alignerr partners with leading AI labs to build expert-driven data pipelines that improve how models reason, learn, and communicate. They work with domain specialists around the world to evaluate and refine AI systems in areas where precision, pedagogy, and human judgment matter most.
Prolific is building the biggest pool of quality human data in the world and is not just another player in the AI space. Over 35,000 AI developers, researchers, and organizations use Prolific to gather data from paid study participants with a wide variety of experiences, knowledge, and skills.
Welo Data, part of Welocalize, is a global AI data company with 500,000+ contributors delivering high-quality, ethical data to train the world’s most advanced AI systems. They’re building smarter, more human AI with a diverse community in 100+ countries.
Alignerr collaborates with top AI labs, creating data pipelines driven by experts to enhance AI models' reasoning, learning, and communication. They partner with domain specialists worldwide, perfecting AI systems where precision, pedagogy, and human judgment are crucial.
Blueprint is a technology solutions firm headquartered in Bellevue, Washington, with a strong presence across the United States. They solve complicated problems, using technology to bridge the gap between strategy and execution, powered by the knowledge, skills, and the expertise of their teams. They are bold, smart, agile, and fun.
Handshake connects students with early talent recruiting. They provide opportunity to evaluate what AI models produce and deliver feedback that strengthens the model’s understanding of workplace tasks and language.
xAI aims to create AI systems that understand the universe and aid humanity. The team is small, motivated, and focused on engineering excellence with a flat organizational structure, expecting all employees to be hands-on.
1mind is a platform that deploys multimodal Superhumans for revenue teams, combining a face, a voice, and a GTM brain. The company has a remote-first, fast-moving culture with ownership, autonomy, and impact from day one.