Define quality metrics, build evaluation datasets, and design rubrics for LLM-generated technical documentation across different content types and languages.
Build benchmarking and experimentation infrastructure, including automated evaluation pipelines and CI-integrated tooling for A/B comparisons and regression detection.
Develop automated quality signals at scale, monitor trends, and run experiments to quantify tradeoffs and inform decisions on model selection and pipeline architecture.
Own agent quality end-to-end: diagnosis, improvement, and validation across SmartAssist's orchestrator and subagents
Drive quality improvements through prompt engineering, context engineering, and RAG retrieval tuning
Extend and mature our evaluation framework: scorers, golden datasets, regression gates, and online evaluation for production traffic
Smartsheet has been helping people and teams achieve for over 20 years. They are building tools that empower teams to automate the manual, uncover insights, and scale smarter.
Creatively writing prompts and responses to a variety of diverse topics
Perform LLM annotation and evaluation tasks (ranking, scoring, labeling, tagging)
Evaluate model outputs for accuracy, relevance, and instruction-following
Welo Data is an AI services company that specializes in data annotation. They deliver high-quality training data transformation solutions for NLP-enabled machine learning by blending technology and human intelligence to collect, annotate, and evaluate all content types.
Conduct experiments with LLMs and evaluate different architectures and techniques to improve conversational AI quality.
Develop and maintain robust evaluation frameworks to assess model performance, accuracy, and user satisfaction using offline and online metrics.
Optimize models for inference, improving speed, efficiency, and scalability for production environments.
Social Discovery Group (SDG) unites millions of users on dozens of products, solving loneliness, isolation, and disconnection by transforming virtual intimacy into the new normal. Their international team of 1000+ professionals works remotely from various locations, and they've been recognized as a "Great Place to Work".
Vetto is a global talent platform connecting top-tier professionals to high-impact AI projects around the world. Their mission is to build trust, quality, and long-term value in the AI ecosystem - for both exceptional talents and companies operating at the frontier of technology.
Design, implement, and evaluate machine learning models and AI algorithms.
Develop and optimize prompts for LLMs to improve model outputs.
Collaborate with software engineers, data scientists, and product teams.
Cadre AI is focused on building and optimizing AI-powered platforms, bringing together cutting-edge technologies and expertise in machine learning and large language models. The team is dedicated to advancing AI capabilities and applying them to real-world challenges through scalable, high-impact solutions.
Design and develop an AI-powered productivity analytics platform.
Build scalable LLM pipelines and create a meta-workflow system.
Develop system-level prompt engineering and build an evaluation framework for AI output quality control.
Appflame is a Ukrainian product-driven tech company committed to building world-class products. They have 500+ team members and offices in Kyiv, London, Limassol, and a co-working hub in Warsaw; they value bold, driven people who are passionate about building real products.
Build and maintain context infrastructure for AI tools.
Design and run evaluation frameworks for AI-generated insights.
Build and orchestrate AI agent systems for analytics tools.
Airtable is a no-code app platform empowering people to accelerate critical business processes. More than 500,000 organizations rely on Airtable to transform how work gets done, suggesting a large company size and a culture of innovation.
Creatively writing prompts and responses to a variety of diverse topics.
Leading labeling initiatives with third party firms and internal customers.
Creating and updating detailed guidelines and specifications for stakeholders.
Welo Data provides AI services, specifically data annotation. They enable brands and companies to reach, engage, and grow international audiences, delivering multilingual content transformation services in translation, localization, and adaptation.
Set the technical vision and reference architecture for agentic AI across applications.
Build and govern reusable platform components to accelerate adoption across teams.
Drive cross-functional roadmaps and integration standards across OCIO and business teams.
PointClickCare helps providers deliver exceptional care. They are a leading health tech company that’s founder-led and privately held, empowering their employees to push boundaries, innovate, and shape the future of healthcare.
Conduct fundamental LLM research using our SOTA story engine.
Create a benchmark for evaluating LLM behavior.
Deliver a benchmark library and a written report of compiled results.
Latitude is building the future of AI-native games by creating a platform where developers and creators can build entirely new kinds of interactive worlds. Latitude is a team of high-agency builders and storytellers who thrive on craft, curiosity, and community.
Lead Agent Development: Drive the development of Owkin’s Data Transformation Agent (DTA).
Orchestrate Data Workflows: Design, implement, and maintain complex data transformation workflows.
Ensure Code Excellence: Define and enforce robust engineering practices.
Owkin is an AI company on a mission to solve the complexity of biology. They are building the first Biology Super Intelligence (BASI) by combining powerful biological large language models, multimodal patient data, and agentic software.
Design features connecting natural language queries with a large corpus of legal knowledge.
Build a data architecture you are proud to highlight.
Use unstructured data to build large scale data sets.
Trellis Law is the leading provider of state trial court data in the U.S. They leverage AI and Machine Learning to analyze hundreds of millions of state trial court documents, transforming complex data into actionable insights. Founded in 2018, Trellis has experienced rapid growth and is now trusted by many of the nation’s largest law firms and corporate legal teams.
Architect and build agentic workflows that combine large language models, reasoning components, and data pipelines to create adaptive, goal-driven conversational systems
Lead the design and development of advanced ML/NLP products, from ideation to production - including model training, evaluation, optimization, and deployment
Drive experimentation with new approaches for agentic reasoning, coordination, and autonomous system design
SmartRecruiters is the Recruiting AI Company that transforms hiring for the world’s leading enterprises. Built for global scale, SmartRecruiters, an SAP company, delivers an AI-powered hiring platform that automates and optimizes the entire talent acquisition process, ensuring faster and smarter hiring decisions. They are a values-driven, globally focused tech company with strong financial backing and a bold vision for the future of work.
Build reproducible pipelines that transform raw data into structured, analysis-ready outputs, with validation and logging built in.
Global Strategy Group (GSG) is a leading public opinion research and communications firm working at the intersection of politics, policy, and public affairs. With a team of 150+ talented professionals, it protects and builds corporate reputations, influences public affairs decision makers, advocates on important social issues, and wins campaigns.
Design and implement scalable ML infrastructure to support model development and deployment
Develop and maintain evaluation frameworks for Large Language Models (LLMs), including RAG-based systems
Evaluate model performance using tools such as RAGAS, DeepEval, or similar frameworks
EX Squared LATAM collaborates with global clients to build innovative digital solutions that drive real business impact. They foster a collaborative, inclusive, and innovation-driven culture where continuous learning and professional growth are at the core of everything they do.
Design, prototype, and deploy Generative AI solutions across client-facing and internal platforms.
Build and optimize applications using large language models (LLMs), vector databases, prompt engineering, and RAG pipelines.
Lead development of AI agents for both digital and voice channels, supporting real-time interactions with clients and internal users.
National Debt Relief, founded in 2009, aims to help consumers deal with overwhelming debt. They are a debt settlement organization that has helped over 450,000 people settle over $10 billion of debt, striving to empower them to lead a healthier financial lifestyle.
Build scalable, production-grade LLM services and agentic workflows, alongside traditional ML systems where appropriate.
Hiflylabs is a team of 250+ data and tech enthusiasts based in Budapest. They focus on data engineering, data science, artificial intelligence and application development, working on a wide range of projects around the world. Hiflylabs values its people and is committed to nurturing their personal and professional development through a mentoring system.
Architect and build automation pipelines that replace high-volume, repeatable content tasks.
Design and develop LLM-powered tooling that enables agentic content creation workflows.
Build and maintain integrations across ServiceNow’s content platforms, knowledge management systems, and AI services.
ServiceNow started in 2004 and stands as a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500(R). Their intelligent cloud-based platform seamlessly connects people, systems, and processes to empower organizations.
Design and implement end-to-end AI solutions for document understanding and automated report generation.
Build and deploy LLM-based systems, including RAG pipelines, to retrieve and combine context from multiple data sources.
Work with unstructured and semi-structured data such as PDFs, documents, images, and historical records, transforming it into usable inputs for AI systems.
Smart Working believes your job should not only look right on paper but also feel right every day. They aim to connect skilled professionals with outstanding global teams and products for full-time, long-term roles in a genuine community that values growth and well-being.