Design and implement comprehensive evaluation frameworks that reflect real-world task success for agentic systems, with a focus on human+AI collaboration outcomes
Build benchmarking pipelines that capture nuanced success indicators including trust calibration, intervention frequency, and agent handoff quality
Collaborate with researchers, engineers, and product teams to align evaluation methodologies with business and user goals
Design, develop, and maintain a robust platform to enable users to create and manage AI agents and their interactions.
Integrate and work with multiple LLMs, ensuring seamless orchestration and scalability for both individual and coordinated agent operations.
Leverage orchestration frameworks like LangGraph and others to build complex workflows and pipelines that support diverse agent functionalities, including frameworks for multi-agent coordination.
ClickUp is building the future of work by creating a converged AI workspace that unifies tasks, docs, chat, calendar, and enterprise search. Their AI-powered platform helps teams break free from silos and unlock new levels of productivity.
This role validates Veeva AI Agents through evaluation. You will define strategies for new AI Agents. The role involves analysis of model behaviors to identify defects.
Veeva Systems is a mission-driven organization and pioneer in industry cloud, helping life sciences companies bring therapies to patients faster.
Design, develop, and deploy agentic AI solutions for clients.
Build multi-agent systems and integrate models with enterprise systems.
Collaborate with clients and engineers to create scalable solutions.
AHEAD builds platforms for digital business, weaving together advances in cloud infrastructure, automation, analytics, and software delivery to help enterprises deliver on digital transformation. They prioritize creating a culture of belonging where all perspectives are valued and heard.
Design complex LLM prompts that accurately represent real customer journeys and service interactions.
Partner with Field Engineers to transform raw data into structured, high-quality tasks for model training.
Annotate and review tasks to ensure strict quality standards and alignment with expected customer outcomes.
Welo Data works with technology companies to provide datasets that are high-quality, ethically sourced, relevant, diverse, and scalable to supercharge their AI models.
Build resilient AI Agents using LangGraph and microservices. Develop complex automation workflows in n8n. Collaborate with Internal Business Analysts to focus on coding, not guessing requirements.
At Gcore, you’ll help design and deliver that foundation for an AI-driven world, being a global provider of infrastructure and software solutions for AI, cloud, network, and security.
Designing, developing, and deploying generative AI models.
Architecting and building agentic systems with autonomous decision-making capabilities.
Integrating generative AI and agentic solutions into existing products and services.
Jobgether is a partner company that focuses on connecting talent with the right job opportunities. Their AI-powered matching process ensures applications are reviewed quickly, objectively, and fairly against the role's core requirements.
Lead domain-specific model optimization using PEFT (LoRA/QLoRA) and knowledge distillation to balance cost, latency, and reasoning capability.
Build next-gen Retrieval-Augmented Generation pipelines using hybrid search, cross-encoders, and self-correcting retrieval loops.
Design and deploy multi-agent systems using frameworks like LangGraph or CrewAI, enabling autonomous task planning and tool-use (Function Calling).
ServiceNow is a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500®. Their intelligent cloud-based platform seamlessly connects people, systems, and processes to empower organizations to find smarter, faster, and better ways to work.
Evaluate AI model outputs related to artists, performers, and athletes.
Develop prompts for AI models reflecting your field of expertise.
Deliver feedback to strengthen the model’s understanding of workplace tasks and language.
Handshake is recruiting Agents and Business Managers of Artists, Performers, and Athlete Professionals to contribute to an hourly, temporary AI research project.
Incorporating the best research work on agents and code generation into the OpenHands framework
Performing novel improvements in areas of interest to improve agent performance and efficiency
Running and implementing evaluations to ensure agent quality
OpenHands is building an open-source AI platform that empowers engineering teams to accelerate development, automate workflows, and integrate intelligent coding assistance into real-world software delivery. The company fosters a culture built on kindness, candor, autonomy, and learning.
As a Principal Decision Scientist, you will define high-level business objectives directly with clients, then develop and execute the project plan to meet those objectives. You will provide technical leadership to guide development work across teams while also owning and delivering specific technical components yourself. You will design and develop feature engineering pipelines, build ML & AI infrastructure, deploy models, and orchestrate advanced analytical insights.
Aimpoint Digital is a premier analytics consulting firm with a mission to drive business value for clients through expertise in data strategy, data analytics, decision sciences
Design, build, and scale enterprise-grade AI/ML systems that power internal workflows and external-facing AI/ML platforms.
Develop a production-ready Generative AI and MLOps platform with reusable components used to deploy multiple AI solutions across Natera’s business units.
Implement cloud-native infrastructure for large-scale model training and serving using Kubernetes, MLflow, Terraform, and AWS-native services
Natera is a global leader in cell-free DNA (cfDNA) testing. They are dedicated to oncology, women’s health, and organ health, aiming to make personalized genetic testing and diagnostics part of the standard of care. The Natera team consists of highly dedicated statisticians, geneticists, doctors, laboratory scientists, business professionals, software engineers and many other professionals from world-class institutions.
Design, develop, and test AI agents to support business objectives and improve operational outcomes.
Integrate agents with enterprise data sources, APIs, and workflows to ensure seamless functionality.
Translate evolving AI capabilities into actionable business and sales use cases.
Highstreet is developing next-generation agentic AI solutions that empower public sector and education (SLED) clients to achieve real-world business outcomes. The company seems to have a modern, flexible workplace culture built for collaboration and growth.
Design, build, and optimize high-performance systems in Python supporting AI data pipelines and evaluation workflows.
Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control.
Improve reliability, performance, and safety across existing Python codebases.
Alignerr connects top technical experts with leading AI labs to build, evaluate, and improve next-generation models. They work on real production systems and high-impact research workflows across data, tooling, and infrastructure.
Design, develop, and deploy automation workflows that reduce manual effort and improve accuracy across back-office functions.
Apply artificial intelligence, machine learning, and modern automation tools to solve complex operational challenges and unlock new efficiencies.
Collaborate with Client Operations, Finance, People, and other G&A teams to identify automation opportunities, gather requirements, and deliver solutions.
Engine is transforming business travel into something personalized, rewarding, and simple. They are building a platform that brings together corporate travel, a powerful charge card, and modern spend management in one place and more than 20,000 companies already rely on Engine.
Handshake is connecting students, new grads, and young professionals with job opportunities. They aim to close the opportunity gap and ensure everyone has equal access to meaningful employment.
Implement AI-enabled backend services within a secure, cloud-native microservices environment.
Design and scale Intelligent Document Processing (IDP) pipelines to extract and validate data from claims, authorizations, and medical documentation.
Integrate LLMs and intelligent agents into clinical and claims workflows to streamline patient and provider interactions.
EZ Labs is committed to transforming healthcare delivery through technology, innovation, and compassion. They partner with care teams, payers, and providers to improve how patients experience care. The company integrates advanced analytics and secure platforms to enable smarter decisions.
Leverage professional experience to evaluate AI models in your field. Assess content related to your field of work. Deliver feedback that strengthens the model’s understanding.
Handshake is recruiting Farm Labor Contractor Professionals to contribute to an hourly, temporary AI research project—but there’s no AI experience needed.