Partner with full-stack and backend engineers on the features they are shipping, write tests that prove it works, and flag gaps early.
Help build and run evaluation pipelines for non-deterministic LLM outputs, prompt regression, model drift detection, and output quality scoring across the LiteLLM routing layer.
Test the Nango-based integration layer across connectors and the file ingestion pipeline including encryption, formatting edge cases, and audit trail continuity.
Improve prompts, model selection, and tool usage so the system gets more decisions right over time.
Reduce latency, token usage, and cost while preserving decision quality and operational reliability.
Design validation, retries, and human review paths for ambiguous, adversarial, incomplete, or conflicting inputs.
Risk Labs is the core team behind UMA and Across, building infrastructure that pushes crypto forward. They value ownership, curiosity, thoughtful risk-taking, and direct communication.
Monitoring and improving quality assurance process
Detailed testing feedback preparation to help the team to improve AI models
Netomi is the leading agentic AI platform for enterprise customer experience. They work with the largest global brands and is backed by WndrCo, Y Combinator, and Index Ventures, helping enterprises drive efficiency, lower costs, and deliver higher quality customer experiences.
Design and build platform-level test infrastructure for iOS and Android.
Apply AI-driven failure classification to reduce noise and prioritize signal.
Partner with the AI Native platform team to ship net-new skills.
Life360's mission is to keep people close to the ones they love by empowering members to protect people, pets, and things they care about most with services, including location sharing and safe driver reports. They are a remote-first company of more than 500 employees.
Pick up live work across data ingestion, knowledge graph integration, and the application layer.
Contribute to the front-end and runtime layer that surfaces AI agent activity, recommendations, and human-in-the-loop governance to client users.
Move freely between Python backend, TypeScript frontend, and infrastructure work as the build demands.
Peach Pilot builds a platform that ingests everything about how a company operates and constructs a Company Brain: a living knowledge graph that connects people, decisions, and outcomes across the entire organization. They are co-founded by Mario Montag and JP James and have a working platform with live infrastructure and a proven data-to-insights methodology.
Lead a team of 6-10 Automation Quality Engineers and drive a transition toward agentic quality engineering.
Define quality architecture and co-design deterministic agent workflows and non-deterministic approaches.
Build tools that make the broader engineering org faster and more quality-conscious.
airSlate is a global SaaS technology company that develops no-code workflow automation, electronic signature, and document management solutions. They have teammates in more than 20 countries across three continents and main hubs in the United States, Poland, Romania, Ukraine and Philippines with an exciting phase of growth and transformation.
Write, iterate, and maintain system prompts and instruction sets for Noodle’s AI agents across the student journey.
Build and maintain evaluation frameworks to measure agent accuracy, tone, hallucination rate, task completion, and alignment with rubric-based learning objectives.
Partner with Noodle teammates and university stakeholders to design, build, and test agents — translating learning objectives, operational flows, rubric assessments, and more into prompt-level agent instructions.
Noodle is higher education’s leading strategy, services, and technology partner that develops infrastructure, provides life-changing learning experiences, and grows the awareness of and the enrollment in some of the best academic institutions in the world. They empower universities to change the world by offering university partners various products and services.
Be the AI engineering technical authority and set the technical standard for how AI is used in code generation, review, testing, and task automation.
Drive the architecture and implementation of automated PR Review, Android and API Test Automation and AI Agent Swarms.
Evangelize by shipping, not by presenting, measuring everything and reporting to the CTO monthly with quantified value delivered.
SpotOn provides independent restaurants with tools to compete and win, including point-of-sale systems and AI-powered profit tools. They are known for their innovative software and technology solutions and are a Great Places to Work recipient.
Talk to people, then build things, working directly with business and engineering teams to understand what's slowing them down.
Own the whole thing by prototyping, hardening, deploying, and monitoring internal tools that need to work reliably.
Write code other people can maintain building clean systems and establishing practical patterns for secure AI usage.
Promenade empowers local businesses with products and services that allow them to thrive online and offline. They build vertically-focused software catered to each industry, leveling the playing field between small businesses and large aggregators; backed by industry investors.
Design, build, and ship agentic workflows across multiple domains.
Build multi-step agents capable of autonomous planning, context tracking, memory, tool use, and API orchestration.
Drive technical and architectural decisions to meet product requirements while also anticipating and designing for future needs
Cority helps customers see and prevent risks across their operations in real time. Our EHS+ platform converges people, data, and AI agents to provide a clear view of information people can trust. For 40 years, Cority has been the market leader in EHS+, recognized by top analysts and trusted by more than 1,500 of the most complex organizations worldwide.
Interact with generative AI models using project-provided guidelines, safety taxonomies, and attack-vector guidance.
Create and evaluate prompts designed to test model behavior across safety-related categories.
Identify where model responses become unsafe, noncompliant, inconsistent, or otherwise problematic.
Welo Data is an AI services company that specializes in data annotation. They deliver multilingual content transformation services in translation, localization, and adaptation for over 250 languages with a growing network of over 400,000 in-country linguistic resources.
Design and ship agentic systems and multi-step LLM workflows using Claude, OpenAI, or equivalent - including tool use, memory, structured output extraction, and failure handling.
Build and maintain MCP integrations connecting internal tools, portco systems, and external data sources into reliable, observable pipelines.
Write production-grade Python for data pipelines, integration scripts, and scheduled jobs running via BullMQ-backed queues on the Node/TypeScript stack.
Emergence is a PE holdco backed by the Pritzker Organization focused on acquiring and scaling B2B SaaS businesses. It combines operational rigor with a growth equity mindset to drive ARR growth and profitability across its portfolio.
Define and drive the QA roadmap across the organization.
Contribute directly to test planning, automation strategy, debugging, exploratory testing, and quality investigations.
Lead, mentor, and manage the QA engineering team, including performance management, and workforce planning.
Blooming Health is on a mission to transform social care with our AI-powered platform that identifies social needs. We support 1,000+ community organizations across the US, helping millions of members access the support they need to stay healthy.
Collaborate with engineering and design to optimize prompt engineering frameworks for open-ended generative AI features.
Research customer interaction models from LLMs to downstream features.
Evaluate the evolving AI ecosystem, including the ChatGPT store and third-party LLM integrations.
Acorns is a financial wellness app that helps everyday people and families save and invest money for the long term. Since 2014, Acorns has grown into a global company with multiple life-stage products serving the needs of kids, teens, adults, and parents.
Design and build high-quality API and services for the international adaptation of Life360.
Work with AI (Claude Code) as a first-class collaborator.
Define and codify AI-Native engineering practices for the International team.
Life360's mission is to keep people close to the ones they love. They have a category-leading mobile app, Tile tracking devices, and Pet GPS tracker empower members to protect the people, pets, and things they care about most. Life360 has more than 500 (and growing!) remote-first employees.
Design and build high-quality features for aging parents, pre-teen wearable users, and community groups; experiences that are safe, intuitive, and genuinely useful.
Work with AI (Claude Code) as a first-class collaborator; your primary workflow involves orchestrating agents to create specs, generate code and tests, verify results, and perform reviews.
Help define and codify AI-Native engineering practices for the Circle Expansion team, establishing playbooks the broader org can adopt.
Life360's mission is to keep people close to the ones they love. Their category-leading mobile app, Tile tracking devices, and Pet GPS tracker empower members to protect the people, pets, and things they care about, with about 500 remote-first employees. The company is AI Native.
Quickly iterate and develop proofs of concept to explore integrating AI into data and marketing workflows.
Make key decisions about the choice of AI architecture and frameworks.
Build production data agents to seamlessly answer analytics and data science questions.
Hightouch is an Agentic Marketing Platform that provides a composable CDP. They enable marketing teams to analyze performance, brainstorm ideas, and generate creative quickly. The team is ambitious and impact-driven, with a focus on humility, kindness, and compassion.
Own end-to-end execution of AI agent deployments from discovery and scoping through launch and optimization.
Configure agent workflows, decision logic, and automation behaviors to maximize accuracy, reliability, and business outcomes.
Implement guardrails and validation frameworks to ensure safe, compliant, and predictable agent performance.
Level AI is transforming how enterprises understand and engage with their customers. Their AI-native CX platform combines conversation intelligence, real-time agent guidance, and AI Virtual Agents to help brands deliver exceptional customer experiences at scale. At Level AI, they operate with urgency, ownership, and a deep customer-first mindset.
Shape technical direction and architecture: Define the foundational architecture for enterprise agentic AI at Benchling.
Build and ship the early portfolio yourself: Write production code at least half your time, particularly during the team's first year.
Design for enterprise from day one: Build for multi-tenant isolation, secrets management, audit logging, payload encryption, role-based access controls, and human-in-the-loop controls calibrated to risk.
Benchling is the AI platform for biotech R&D. Scientists use Benchling to design experiments, capture structured data, and run AI agents and models directly in their workflows. They have over 200,000 scientists around the world, from academic labs to Sanofi and Moderna.
Evaluate and refine AI prototypes built by business units to enhance commercial ROI, security, and architecture.
Refactor high-value internal AI prototypes into secure, scalable, enterprise-grade applications.
Build and maintain secure LLM integrations with internal systems like data lakes and Salesforce, ensuring full-lifecycle maintenance of applications.
Impiricus is an AI-powered HCP Engagement Engine that ethically connects healthcare professionals to pharmaceutical resources to reduce go-to-market costs and accelerate patient access to treatments. It is a fast-growing company with a unique network of HCPs and advisors, fostering a collaborative and impactful culture where employees can work flexibly.
Identify and label languages and dialects from model-generated responses.
Review outputs from two different AI models and determine which model correctly identified the proposed language.
Compare model responses and select the appropriate evaluation outcome from predefined options
RWS – TrainAI is looking for Language Data Annotators. They embrace DEI and promotes equal opportunity and prohibits discrimination and harassment of any kind.