Source Job

Latin America

  • Own the full QA lifecycle for Agentic AI products, including strategy, design, execution, reporting, and release sign-off.
  • Design and run test plans covering various testing types such as functional, regression, smoke, exploratory, and usability for AI behavior and decision chains.
  • Validate multi-step decision flows and reasoning to catch logic gaps, guardrail failures, or requirement mismatches.

GenAI LLM QA UX Testing

14 jobs similar to Product & UX Tester - Gen AI Latin America Corporate

Jobs ranked by similarity.

$75,000–$90,000/yr
US

Serve as the embedded QA engineer on two pods (Jump’s cross-functional teams), collaborating with product managers to evaluate AI outputs, run exploratory and regression testing, and unblock engineers and PMs. Learn and track AI/ML quality signals, including golden datasets, prompt/regression suites, and metrics. Build dashboards for quality KPIs (defect escape rate, flake rate, regression coverage, MTTD/MTTR, AI eval scores) and drive continuous improvement.

Jump’s mission is to empower financial advisors and their clients to thrive in the age of AI.

Responsible for functional, regression, and end-to-end testing of steering and skill bots. Ensure that virtual agent solutions deliver accurate, consistent, and high-quality customer experiences. Validate and optimize bot performance by understanding conversation design, NLU behavior, and bot architecture.

Miratech is a global IT services and consulting company that brings together enterprise and start-up innovation, supporting digital transformation for large enterprises.

India

  • Testing of AI based conversational products.
  • Monitoring and improving quality assurance process ensuring any agreed-upon standards and procedures are followed.
  • Evaluating and identifying where enhancements in accuracy of models are required.

Netomi is the leading agentic AI platform for enterprise customer experience, working with the largest global brands to enable agentic automation at scale.

$85,000–$225,000/yr
US Canada

This role validates Veeva AI Agents through evaluation. You will define strategies for new AI Agents. The role involves analysis of model behaviors to identify defects.

Veeva Systems is a mission-driven organization and pioneer in industry cloud, helping life sciences companies bring therapies to patients faster.

$40,000–$60,000/yr
Global

  • Shape product roadmaps and user stories, prioritizing features for seamless experiences and market validation.
  • Partner with engineering, design, and PMs in fast sprints to prototype, test, and deploy capabilities.
  • Collect user feedback, run market research, and analyze metrics to refine backlogs and boost retention.

Wing is seeking elite talent to join M32 AI (backed by top-tier Silicon Valley VCs), dedicated to building agentic AI for SMB's globally.

US

  • Produce clear diagrams, documentation, and implementation plans for QA systems / processes.
  • Write test automation that can validate across the entire stack; front-end, backend, and database.
  • Work closely with cross-functional teams, including support and customer success.

GovWorx is a mission-driven technology company dedicated to supporting public safety agencies through responsible AI solutions.

Design and implement agentic architecture, defining context management, data flow, and action orchestration. Build AI variables capable of autonomous action loops to enrich leads and trigger actions. Deliver Copilot v1, initially semi-agentic, with potential for autonomous workflows, while implementing monitoring of all output.

lemlist is a global B2B SaaS business with $43M ARR, fully bootstrapped, profitable, and growing fast, shipping one of the most loved Sales Engagement Platforms worldwide.

Mexico

  • Set client QA strategies and adapt to scope/volume changes.
  • Run root-cause analyses; drive CAPA plans with owners, timelines, and effectiveness checks.
  • Plan training & certification for raters/annotators and coordinators; track completion and impact.

Welo Data provides high-quality, ethically sourced, relevant, diverse, and scalable datasets to technology companies to supercharge their AI models. As a Welocalize brand, WeloData leverages over 25 years of experience and brings together a curated global community of over 500,000 AI training and domain experts.

Canada

  • Shape AI-enabled development at Jane by setting a clear strategy for how engineers ideate, code, test, review, and ship with AI.
  • Prototype often, share what you learn, and model best practices by building small, high-impact tools that others can use.
  • Lead and support a small senior team while continuing to contribute technically, whether that means pairing with engineers, reviewing designs, or jumping into code when it matters most.

Jane is a team that's all about fostering growth, spreading delight, and serving our healthcare community by simplifying the lives of healthcare practitioners and patients daily.

North America Canada

Distill customer feedback into a cohesive product vision. Own end-to-end feature development by defining product requirements and managing development & testing. Maintain a perspective on the evolving generative AI landscape to feed product evolution.

ServiceNow stands as a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500®.

$200,000–$225,000/yr
US Unlimited PTO

  • Support the emerging product, Night Shift, an AI research assistant.
  • Own the AI evaluation framework, working closely with Engineering (Backend, Frontend, and Design).
  • Contribute to the system architecture for agentic AI, aiming for faster, more accurate leads for officers.

Flock Safety is the leading safety technology platform, helping communities thrive by taking a proactive approach to crime prevention and security.

Europe

  • Complex testing of mobile (Android, iOS) and web applications in the eCommerce/qCommerce area.
  • Verification of the functionality of purchasing processes and payment paths.
  • Designing and implementing test cases and reporting errors.

Działamy od 2021 roku jako część Żabki Future, jednostki biznesowej Grupy Żabka, której misją jest tworzenie wartości poprzez upraszczanie ludziom życia.

Europe

  • Contribute to building smarter, more inclusive AI systems.
  • Work on annotation, evaluation, and prompt creation projects.
  • Join a global network of linguists and language enthusiasts.

Welo Data, part of Welocalize, is a global AI data company with 500,000+ contributors delivering high-quality, ethical data to train the world’s most advanced AI systems.

$30–$35/hr
Global

  • Evaluate AI-generated Japanese speech and text for linguistic accuracy, naturalness, and educational quality.
  • Assess learner speech and writing across proficiency levels from CEFR Pre-A1 through B2+.
  • Apply expert judgment to identify learner errors, unnatural phrasing, and pedagogical gaps.

Alignerr partners with leading AI labs to build expert-driven data pipelines that improve how models reason, learn, and communicate. They work with domain specialists around the world to evaluate and refine AI systems in areas where precision, pedagogy, and human judgment matter most.