This role validates Veeva AI Agents through evaluation. You will define strategies for new AI Agents. The role involves analysis of model behaviors to identify defects.
Source Job
20 jobs similar to AI Data Engineer
Jobs ranked by similarity.
- Build infrastructure that enables quality monitoring across Design Generation.
- Guide Design Generation teams on how to evaluate their systems effectively.
- Build platforms that make evaluation accessible and automated.
Canva is a company that is redefining how the world experiences design.
- Support the emerging product, Night Shift, an AI research assistant.
- Own the AI evaluation framework, working closely with Engineering (Backend, Frontend, and Design).
- Contribute to the system architecture for agentic AI, aiming for faster, more accurate leads for officers.
Flock Safety is the leading safety technology platform, helping communities thrive by taking a proactive approach to crime prevention and security.
Design and implement agentic architecture, defining context management, data flow, and action orchestration. Build AI variables capable of autonomous action loops to enrich leads and trigger actions. Deliver Copilot v1, initially semi-agentic, with potential for autonomous workflows, while implementing monitoring of all output.
lemlist is a global B2B SaaS business with $43M ARR, fully bootstrapped, profitable, and growing fast, shipping one of the most loved Sales Engagement Platforms worldwide.
- Design complex LLM prompts that accurately represent real customer journeys and service interactions.
- Partner with Field Engineers to transform raw data into structured, high-quality tasks for model training.
- Annotate and review tasks to ensure strict quality standards and alignment with expected customer outcomes.
Welo Data works with technology companies to provide datasets that are high-quality, ethically sourced, relevant, diverse, and scalable to supercharge their AI models.
- Review AI-generated responses and evaluate technical accuracy.
- Provide expert feedback to train AI systems to write better code.
- Work with various programming languages and coding challenges.
G2i connects subject-matter experts, students, and professionals with flexible, remote AI training work such as annotation, evaluation, fact-checking, and content review.
Serve as the embedded QA engineer on two pods (Jump’s cross-functional teams), collaborating with product managers to evaluate AI outputs, run exploratory and regression testing, and unblock engineers and PMs. Learn and track AI/ML quality signals, including golden datasets, prompt/regression suites, and metrics. Build dashboards for quality KPIs (defect escape rate, flake rate, regression coverage, MTTD/MTTR, AI eval scores) and drive continuous improvement.
Jump’s mission is to empower financial advisors and their clients to thrive in the age of AI.
- Apply bleeding edge AI theory to the design and implementation of large-scale data systems that feed AI agents and autonomous workflows.
- Use data science techniques to fine-tune, evaluate, and optimize LLMs for marketing-specific tasks.
- Build end-to-end automations using LLMs, internal data, and external signals to eliminate repetitive human tasks.
Rockerbox is building the next generation of marketing intelligence. They are looking for someone to help them build the AI systems everyone else just theorizes about.
- Own and execute a strategic roadmap for AI research, messaging, and context capabilities.
- Enhance Apollo's AI research agents to surface actionable insights from the web.
- Define how AI understands each user's business, transforming generic AI outputs into relevant recommendations.
Apollo.io is the leading go-to-market solution for revenue teams, trusted by over 500,000 companies and millions of users globally.
- Build FastAPI services, Pydantic models, evaluation tooling, integrations with device APIs, and guardrails to keep responses safe and useful.
- Drive implementation for Phase 2 of the AI Sleep Chat targeting Q2 2026.
- Translate voice/text commands into device API calls, design and implement multi-agent architecture for command processing pipeline.
At Hatch, they’re on a mission to help people build better sleep habits—so they can feel more focused, energized, and present in their lives.
- Own the full QA lifecycle for Agentic AI products, including strategy, design, execution, reporting, and release sign-off.
- Design and run test plans covering various testing types such as functional, regression, smoke, exploratory, and usability for AI behavior and decision chains.
- Validate multi-step decision flows and reasoning to catch logic gaps, guardrail failures, or requirement mismatches.
Wing is seeking elite talent to join M32 AI (a subsidiary of Wing, backed by top-tier Silicon Valley VCs), dedicated to building agentic AI for traditional service businesses.
- Design, build, and optimize high-performance systems in Python supporting AI data pipelines and evaluation workflows.
- Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control.
- Improve reliability, performance, and safety across existing Python codebases.
Alignerr connects top technical experts with leading AI labs to build, evaluate, and improve next-generation models. They work on real production systems and high-impact research workflows across data, tooling, and infrastructure.
- Establish the technical vision for our AI product infrastructure.
- Develop frameworks that make LLM integration seamless and reliable; building APIs and SDKs that allow LLMs to interface with Wealthsimple data and functionality.
- Build new AI-powered product capabilities from 0 → 1; collaborating directly with product teams to bring AI features to life.
Wealthsimple is on a mission to help everyone achieve financial freedom by reimagining what it means to manage your money.
The AI Engineer develops and deploys agentic AI solutions for clients. Implements components for document processing, workflow automation, data retrieval, and structured output generation. Contributes to monitoring, logging, metrics, and guardrail configurations for agentic systems.
AHEAD builds platforms for digital business by weaving together advances in cloud infrastructure, automation and analytics, and software delivery.
- Design AI orchestration patterns for agent handoffs and state flow.
- Build and maintain our AI orchestration layer.
- Design data pipelines to capture insights from patient/member interactions.
Blooming Health is a mission-driven, venture-backed health tech company transforming the social care landscape.
In this role, you will drive the strategic and technical direction of AI engineering across multiple product lines, shaping how advanced AI capabilities power user-facing automation experiences. You will lead high-impact engineering teams, guiding experimentation, architecture decisions, and end-to-end delivery of AI-driven solutions. Working closely with product, design, and go-to-market teams, you will help define the roadmap, unify core AI initiatives, and turn bold ideas into scalable outcomes.
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements.
- Review and map existing workflows for downloading, uploading, and processing reports.
- Reduce manual work and mistakes through automation and AI validation.
- Build AI agents trained on our internal guides, documentation, datasets, and historical Slack conversations.
GNO Partners helps Amazon FBA sellers scale their businesses by implementing proven systems and strategies. Their fully remote team of experts has worked with over 600 brands and is currently partnered with more than 300 active clients.
- Build the shared AI execution platform that powers every AI product.
- Shape the unified architecture, guide tough tradeoffs, and elevate engineering standards.
- Ensure the platform is scalable, safe, cost-efficient, and easy to build on.
Zapier builds a platform to help millions of businesses scale with automation and AI, aiming to make automation work for everyone.
- Own end-to-end implementation of AI-powered product features, from prototypes to production.
- Mentor other engineers on the team, leveling up the team as a whole.
- Collaborate across the organization to support shipping these features to production.
Honeycomb is a service for the near and present future, defining observability and raising expectations of what developer tools can do!
- Build and maintain gen AI prompts aligned with ad formats and community dynamics.
- Improve the quality and brand safety of model outputs across text, images, and video.
- Partner with Product and Engineering to prioritize improvements and accelerate feature development.
Reddit is a community built on shared interests and trust, home to open conversations and one of the internet’s largest sources of information.
This is a player-coach role where you'll architect and build AI-powered engagement systems while developing a high-performing team. You will translate complex ML systems into accessible, scalable solutions that drive member engagement and business growth. You'll have the autonomy to shape our technical strategy and the resources to build a world-class team.
Sword Health is shifting healthcare from human-first to AI-first through its AI Care platform, making world-class healthcare available anytime, anywhere.