Source Job

Global Unlimited PTO 17w maternity 17w paternity

  • Create and curate an evaluation suite of real-world tasks for frontier AI models.
  • Rigorously evaluate AI systems, analyze results, and communicate findings.
  • Improve evaluation processes and potentially build out standalone benchmarks.

Analytical Thinking Data Analysis Written Communication Python

10 jobs similar to Researcher, Evaluations

Jobs ranked by similarity.

Global Unlimited PTO

  • Implement and maintain AI benchmarks using evaluation infrastructure like the Inspect library.
  • Contribute to the design and development of new benchmarks for frontier AI models.
  • Collaborate with researchers and engineers to ensure accurate and insightful evaluation data.

Epoch AI is a research institute investigating trends in machine learning and the economic consequences of AI. With a small, mission-driven team, we aim to provide rigorous, independent insights into AI development.

Global

  • Conduct detailed company, market, and competitor research and produce investment briefing reports.
  • Review legal and commercial documents, support due diligence, and analyze funding rounds and investor activity.
  • Manage operational projects independently, leverage AI tools for workflow efficiency, and prepare executive-level recommendations.

Assist World connects top talent with remote executive support roles. It is a growing company focused on AI-powered operations, offering a flexible, remote-first culture with competitive compensation and performance bonuses.

US Unlimited PTO

  • Evaluate and select cutting-edge AI models to enhance product capabilities and user experience.
  • Design evaluation frameworks and configure observability for AI performance in production.
  • Collaborate with data science, CTO, and engineering teams to fine-tune and integrate AI models.

Vetcove modernizes veterinary software and pet healthcare with a procurement marketplace, home delivery ecommerce, and practice management system. Over 25,000 hospitals across all 50 states use the platform daily, and the company is backed by Y Combinator and top venture investors.

Global Unlimited PTO 12w maternity 12w paternity

  • Build automated vendor intelligence pipelines that continuously collect and parse AI system cards, model benchmarks, security disclosures, and public vendor documentation.
  • Design synthesis systems that map disparate vendor information to our risk taxonomy, translating technical capabilities into governance-relevant risk signals.
  • Implement quality evaluation for generated risk profiles and create adaptive interpretation systems that adjust risk assessments based on organizational context.

Credo AI is a venture-backed company on a mission to empower organizations to responsibly build, adopt, procure and use AI at scale. Founded in 2020, Credo AI has been recognized as a Most Innovative Company of 2024 by Fast Company and a Technology Pioneer by the World Economic Forum.

Global

  • Scale the human data engine behind frontier AI systems.
  • Own onboarding, talent activation, and client success across high-velocity AI training deployments.
  • Operate at the intersection of data, operators, and enterprise delivery.

Our client is a rapidly growing, venture-backed AI company combining human expertise with machine learning workflows to build advanced AI systems. Backed by more than $40 million in funding and supported by a rapidly expanding international network of experts, the company is building critical human intelligence infrastructure for the AI economy.

US

  • Lead discovery interviews and workshops to understand client processes and identify AI and automation opportunities.
  • Build process models, drive Kaizen sessions, and develop business cases with clear value propositions and measurable outcomes.
  • Partner with technical teams to translate business needs into actionable backlogs, prototypes, and pilot programs.

Trissential connects top talent with meaningful opportunities driving innovation and growth, partnering with clients on cutting-edge AI initiatives. They offer a flexible, fully remote work culture that values curiosity, innovation, and continuous learning.

US

  • Review AI-generated compensation analyses and recommendations for accuracy and business relevance.
  • Identify errors, gaps, and areas for improvement in AI outputs.
  • Provide detailed written feedback to enhance AI agent performance.

Compa is an AI startup building a real-time compensation data platform for enterprise teams. It is a venture-backed company with a collaborative and driven culture.

US

  • Define and own the company-wide AI strategy, identifying high-impact use cases across products, operations, and customer experience.
  • Translate business problems into AI opportunities, driving pilots from concept to production with measurable outcomes.
  • Establish AI governance frameworks covering transparency, security, and regulatory compliance, and build internal AI literacy.

Xplor Technologies provides modern vertical software, embedded payments, and AI-powered capabilities to help businesses in fitness, recreation, and other service industries simplify operations and elevate customer experiences. With over 130,000 businesses across 72+ countries, the company processes over $47 billion in payments annually and fosters a culture of curiosity, empathy, and meaningful work.

Canada 16w maternity 16w paternity

  • Design, build, and deploy AI-powered product features leveraging LLMs to enhance user workflows.
  • Develop and maintain evaluation frameworks, guardrails, and monitoring systems for AI reliability.
  • Collaborate with cross-functional teams to integrate generative AI into backend services and APIs.

This company provides a modern SaaS platform that helps organizations strengthen security, compliance, and trust. They have a remote-first environment with a strong engineering culture focused on ownership and collaboration.

US Unlimited PTO

  • Lead analyses translating data into actionable insights for post-campaign reports.
  • Build and maintain tagging models using prompt engineering and traditional ML methods.
  • Synthesize analyses across programs to inform strategy and contribute to organic social research.

Vocal Media is a company focused on measuring the impact of digital campaigns and uncovering insights for social impact. They are a fully remote team committed to diversity and equity, with a culture that values leadership and inclusivity.