Source Job

$150,000–$208,000/yr
US

  • Lead the AI Evaluation team, owning staffing, coaching, performance management, and delivery of evaluation and testing frameworks.
  • Manage the AI evaluation lifecycle — including pre-launch testing, simulation, and post-deployment health monitoring — ensuring alignment with governance standards and expectations.
  • Create domain-specific evaluation tracks (e.g., Compliance & Risk, Bot Experience, Agent Experience) to assess AI quality from multiple perspectives.

SQL Python Data Visualization Communication Program Management

20 jobs similar to Manager, AI Operations & Evaluation

Jobs ranked by similarity.

$111,888–$128,633/yr
Canada US

  • Design and build production-grade AI systems, including RAG pipelines, multi-step agents, and LLM-powered features.
  • Build comprehensive evaluation and observability frameworks to measure model accuracy, grounding, and quality drift.
  • Create production-quality Python services to wrap AI logic into secure microservices.

League, founded in 2014, is the leading healthcare consumer experience (CX) platform powered by AI, reaching over 63 million people globally. Payers, providers, and consumer health partners use League’s platform to deliver high-engagement healthcare solutions and improve health outcomes.

US

  • Support model launch readiness by running evaluations, monitoring and interpreting results, and surfacing regressions or unexpected behavior changes to relevant stakeholders
  • Partner closely with policy and domain experts throughout the evaluation lifecycle — from identifying risks and scoping the right evaluation approach, to coordinating creation of new evals and ensuring existing ones remain current with evolving policies, threat vectors, and model capabilities
  • Work with cross-functional stakeholders to help manage evaluation outcomes, including interpreting results and driving mitigations where needed

Anthropic's mission is to create reliable, interpretable, and steerable AI systems. Their team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

$179,400–$250,000/yr
US

  • Lead development and delivery of scalable software systems supporting AI models.
  • Shape AI architecture from high-level vision to robust implementation.
  • Manage a small team of senior software engineers in a player/coach model.

Headspace offers access to lifelong mental health support, combining evidence-based content, clinical care, and innovative technology. Their values shape decisions, guide collaborations, and define their culture, building a more connected, human-centered team.

$185,000–$200,000/yr
US 4w PTO

  • Define and execute the roadmap for Sayari’s AI infrastructure.
  • Lead the strategy for model selection, fine-tuning, and deployment.
  • Establish the "ground truth" for Sayari AI.

Sayari is a venture-backed and founder-led global corporate data provider and commercial intelligence platform that serves financial institutions, legal and advisory service providers, multinationals, journalists, and governments. Their company culture is defined by a dedication to their mission of using open data to prevent illicit commercial and financial activity and they embrace cross-team collaboration.

$177,000–$250,300/yr
US

  • Own Agent retrieval accuracy and relevance.
  • Drive automated resolution rates.
  • Manage AI safety and trust.

Airtable is the no-code app platform that empowers people closest to the work to accelerate their most critical business processes. More than 500,000 organizations, including 80% of the Fortune 100, rely on Airtable to transform how work gets done.

Canada Unlimited PTO

  • Own the delivery and outcomes of a cross-functional AI innovation team.
  • Manage and support the engineers on your team through regular feedback, coaching, and career development.
  • Drive planning and execution for the team’s roadmap, ensuring a healthy, well-prioritized backlog aligned with company goals.

Fullscript is a health technology company with a mission to help people get better by making it easier for practitioners to access products. They started in 2011 and now have over 125,000 practitioners using their platform and over 10 million patients relying on it.

$137,380–$164,959/yr
Canada 6w PTO

  • Build & operate GTM automation systems
  • Implement modular, scalable, multi-agent AI systems that operate 24/7 and integrate with marketing platforms
  • Create reusable workflow templates, playbooks, and “how-to” docs so partner teams can safely self-serve common automations

Grafana Labs is a remote-first, open-source powerhouse with over 20M users of Grafana, its open source visualization tool. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack and thrive in an innovation-driven environment.

US Unlimited PTO

  • Architect and deploy autonomous AI agents and multi-agent workflows.
  • Design strict-source-following Retrieval-Augmented Generation (RAG) systems.
  • Build scalable backend services using FastAPI.

Osano is an innovative B-Corporation focused on giving modern enterprises the ability to innovate quickly and earn customer trust by respecting data privacy and complying with consent guidelines. We are scaling fast with a multi-year runway and ambitious growth plans.

US

  • Design and curate evaluation datasets for retrieval quality.
  • Measure retrieval quality using metrics like Recall@k, Precision@k, MRR, and NDCG@k.
  • Conduct systematic error analysis on AI/ML system outputs; build structured failure taxonomies.

Jump empowers financial advisors, firms, and clients to thrive in the age of AI by automating tasks like meeting prep and compliance. As a Series A company, Jump has raised $30M and grown to 100+ employees including leaders from top companies and schools, fostering a culture of velocity, world-class standards, direct communication, and kindness.

  • Act as a "scouter" for the executive team, taking high-level goals and executing them from start to finish.
  • Personally design, configure, and deploy AI agents and automated workflows using systems such as Microsoft Copilot Studio or Power Automate.
  • Train staff on the tools you build and iterate based on their direct feedback.

Ennoble Care is a mobile primary care, palliative care, and hospice service provider with patients in multiple states. They provide continuum of care for those with chronic conditions and limited mobility, striving to provide the highest quality of care by a team patients know and trust.

$130,000–$210,000/yr
US

  • Partner with executive sponsors to identify high-impact use cases and turn them into measurable business outcomes.
  • Translate business needs into clear problem statements and success metrics; collaborate with Product and R&D.
  • Design and build AI agents with and for customers, rethinking business processes for maximum usability.

Jobgether connects job seekers with companies using an AI-powered matching process. They focus on ensuring applications are reviewed quickly and fairly, and they share top candidates with hiring companies.

  • Define and evolve the product vision and roadmap for Iris — our AI agent driving app growth.
  • Drive rapid iterations based on customer feedback and product data.
  • Work closely with engineering teams to deliver scalable, high-impact AI capabilities.

SplitMetrics is a global software company offering an ecosystem of products and services that serve as a growth engine for top mobile-first businesses worldwide. They have been at the forefront of the mobile marketing industry for almost 10 years with a remote-first and supportive culture.

$160,000–$200,000/yr
US

  • Audit GTM workflows across sales, marketing, and CS to identify high-impact opportunities for AI and agentic automation.
  • Design and build agentic workflows using tools like Clay, Claude, and Hightouch that replace manual busywork and accelerate pipeline
  • Lead enablement across GTM teams; train reps, marketers, and CSMs on new AI-powered workflows and drive sustained adoption.

Hightouch is the modern AI platform for marketing and growth teams. Our AI agents reimagine marketing workflows, allowing marketers to create content, plan campaigns, and execute strategies with transformational velocity and performance. We are a leader in AI marketing and partner with industry leaders like Domino’s, Chime, Spotify, Ramp, Whoop, Grammarly, and over 1000 others.

North America

  • Convert two priority BUs to AI-driven delivery within months.
  • Architect and deploy a scalable AI operating model across all BUs.
  • Embed AI into core workflows: delivery, operations, client lifecycle, and decision-making.

CORA Group, an operating group of Jonas Software under Constellation Software Inc., acquires, strengthens, and grows vertical market software companies. They operate with short decision lines and decentralized business unit leadership, expecting strong ownership of results.

US

  • You will define, build, and evolve foundational systems that enable autonomous agents to operate reliably in production.
  • You’ll explore new approaches, prototype quickly, and turn what works into durable platform foundations.
  • You’ll identify high-leverage architectural improvements, abstractions, and guardrails that expand what the platform can do while keeping it reliable, secure, observable, and maintainable under real-world conditions.

Kindo is an agent automation platform for DevOps and SecOps teams, helping organizations automate high-friction operational work using autonomous agents. They are a small, highly technical team with strong customer traction and real enterprise revenue, where engineers have direct ownership over critical systems.

Europe 6w PTO

  • Scope and implement AI Agent deployments, providing strategic advice and execution support to customers and partners.
  • Leverage knowledge of LLM internals to analyze customer requirements and design precise prompts for reliable, user-aligned behavior.
  • Fine-tune conversational flows and voice output to align with customer brand standards.

Parloa is a fast-growing startup in the world of Generative AI and customer service. They have over 400 employees in Berlin, Munich, and New York and are expanding globally.

$132,000–$170,000/yr
US

  • Define and own the product vision for AI‑assisted performance and agentic workflows.
  • Build and execute a clear, outcome‑driven product strategy and roadmap.
  • Partner closely with Engineering and Design to deliver intuitive, low‑friction experiences.

15Five is the AI-powered performance management platform built for business impact. They empower HR leaders and transform managers into change-makers, accelerating engagement, performance, and retention in a fast-paced, remote startup environment.

$150,000–$200,000/yr
US

  • Partner with executive sponsors and end users to identify high impact use cases and turn them into measurable business outcomes on Glean.
  • Lead strategic reviews and advise customers on their AI roadmap, ensuring they get the most value from Glean’s platform.
  • Translate business needs into clear problem statements, success metrics, and practical AI solutions; collaborate with Product and R&D to shape priorities.

Glean is an innovative AI-powered knowledge management platform designed to help organizations quickly find, organize, and share information across their teams. The company's cutting-edge AI technology simplifies knowledge discovery, making it faster and more efficient for teams to leverage their collective intelligence.

$133,600–$233,800/yr
North America

  • Partner with Major Area Leaders to develop and track customer adoption plans.
  • Monitor product adoption metrics and customer health indicators using AI-driven insights.
  • Prepare executive-ready materials for business reviews and strategic planning sessions.

ServiceNow, founded in 2004, provides AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500®. Their cloud-based platform connects people, systems, and processes to empower organizations to work better; they aim to make the world work better for everyone.

North America Europe Middle East Africa

  • Lead AI & Machine Learning teams for high-impact initiatives aligned with company goals.
  • Own and evolve Zapier’s AI/ML strategy, ensuring scalable, reliable tools aligned with business needs.
  • Collaborate cross-functionally to enhance products/services by integrating AI capabilities across the organization.

Zapier builds and uses automation to make work more efficient, creative, and human. They are a fast-growing and remote-first company.