Source Job

$75,000–$90,000/yr
US

Serve as the embedded QA engineer on two pods (Jump’s cross-functional teams), collaborating with product managers to evaluate AI outputs, run exploratory and regression testing, and unblock engineers and PMs. Learn and track AI/ML quality signals, including golden datasets, prompt/regression suites, and metrics. Build dashboards for quality KPIs (defect escape rate, flake rate, regression coverage, MTTD/MTTR, AI eval scores) and drive continuous improvement.

QA AI ML SaaS Data

15 jobs similar to QA Engineer for Generative AI

Jobs ranked by similarity.

Latin America

  • Own the full QA lifecycle for Agentic AI products, including strategy, design, execution, reporting, and release sign-off.
  • Design and run test plans covering various testing types such as functional, regression, smoke, exploratory, and usability for AI behavior and decision chains.
  • Validate multi-step decision flows and reasoning to catch logic gaps, guardrail failures, or requirement mismatches.

Wing is seeking elite talent to join M32 AI (a subsidiary of Wing, backed by top-tier Silicon Valley VCs), dedicated to building agentic AI for traditional service businesses.

$85,000–$225,000/yr
US Canada

This role validates Veeva AI Agents through evaluation. You will define strategies for new AI Agents. The role involves analysis of model behaviors to identify defects.

Veeva Systems is a mission-driven organization and pioneer in industry cloud, helping life sciences companies bring therapies to patients faster.

Mexico

  • Set client QA strategies and adapt to scope/volume changes.
  • Run root-cause analyses; drive CAPA plans with owners, timelines, and effectiveness checks.
  • Plan training & certification for raters/annotators and coordinators; track completion and impact.

Welo Data provides high-quality, ethically sourced, relevant, diverse, and scalable datasets to technology companies to supercharge their AI models. As a Welocalize brand, WeloData leverages over 25 years of experience and brings together a curated global community of over 500,000 AI training and domain experts.

US

  • Produce clear diagrams, documentation, and implementation plans for QA systems / processes.
  • Write test automation that can validate across the entire stack; front-end, backend, and database.
  • Work closely with cross-functional teams, including support and customer success.

GovWorx is a mission-driven technology company dedicated to supporting public safety agencies through responsible AI solutions.

Canada

  • Shape AI-enabled development at Jane by setting a clear strategy for how engineers ideate, code, test, review, and ship with AI.
  • Prototype often, share what you learn, and model best practices by building small, high-impact tools that others can use.
  • Lead and support a small senior team while continuing to contribute technically, whether that means pairing with engineers, reviewing designs, or jumping into code when it matters most.

Jane is a team that's all about fostering growth, spreading delight, and serving our healthcare community by simplifying the lives of healthcare practitioners and patients daily.

Mexico

  • Set program quality goals, roadmap, and operating rhythms.
  • Manage a team of Analysts and Coordinators; hire, coach, and run performance cycles.
  • Co-own client governance with Ops; align on scope, priorities, and changes.

Welo Data provides high-quality, ethically sourced, relevant, diverse, and scalable datasets to technology companies to supercharge their AI models. As a Welocalize brand, they bring together a global community of over 500,000 AI training and domain experts.

$200,000–$225,000/yr
US Unlimited PTO

  • Support the emerging product, Night Shift, an AI research assistant.
  • Own the AI evaluation framework, working closely with Engineering (Backend, Frontend, and Design).
  • Contribute to the system architecture for agentic AI, aiming for faster, more accurate leads for officers.

Flock Safety is the leading safety technology platform, helping communities thrive by taking a proactive approach to crime prevention and security.

$180,000–$230,000/yr
US

The AI Engineer develops and deploys agentic AI solutions for clients. Implements components for document processing, workflow automation, data retrieval, and structured output generation. Contributes to monitoring, logging, metrics, and guardrail configurations for agentic systems.

AHEAD builds platforms for digital business by weaving together advances in cloud infrastructure, automation and analytics, and software delivery.

North America Canada

Distill customer feedback into a cohesive product vision. Own end-to-end feature development by defining product requirements and managing development & testing. Maintain a perspective on the evolving generative AI landscape to feed product evolution.

ServiceNow stands as a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500®.

India

  • Testing of AI based conversational products.
  • Monitoring and improving quality assurance process ensuring any agreed-upon standards and procedures are followed.
  • Evaluating and identifying where enhancements in accuracy of models are required.

Netomi is the leading agentic AI platform for enterprise customer experience, working with the largest global brands to enable agentic automation at scale.

$172,000–$215,000/yr

  • Build and refine early-stage AI systems and frameworks.
  • Translate complex problems into structured engineering approaches.
  • Champion technical excellence and mentorship in AI development.

The AI Innovation Team at Mural is pioneering how generative AI transforms visual collaboration and decision-making.

  • Design, build, and maintain internal tools and comprehensive frameworks supporting unit, integration, API, and UI testing.
  • Architect and evolve the load, scale, and performance testing systems to understand system limits and verify system resilience.
  • Evaluate and implement AI-driven tools for automated test generation and maintenance.

ClickUp is on a mission to make the world more productive by enabling their engineering team with the right tools, frameworks, and best practices.

Design and implement agentic architecture, defining context management, data flow, and action orchestration. Build AI variables capable of autonomous action loops to enrich leads and trigger actions. Deliver Copilot v1, initially semi-agentic, with potential for autonomous workflows, while implementing monitoring of all output.

lemlist is a global B2B SaaS business with $43M ARR, fully bootstrapped, profitable, and growing fast, shipping one of the most loved Sales Engagement Platforms worldwide.

US

  • Build and maintain gen AI prompts aligned with ad formats and community dynamics.
  • Improve the quality and brand safety of model outputs across text, images, and video.
  • Partner with Product and Engineering to prioritize improvements and accelerate feature development.

Reddit is a community built on shared interests and trust, home to open conversations and one of the internet’s largest sources of information.