Jobs Similar to Applied Data Scientist, LLM Evaluation

Applied Data Scientist, LLM Evaluation

Driver 6 hours ago

$175,000–$275,000/yr

US Unlimited PTO

Define quality metrics, build evaluation datasets, and design rubrics for LLM-generated technical documentation across different content types and languages.
Build benchmarking and experimentation infrastructure, including automated evaluation pipelines and CI-integrated tooling for A/B comparisons and regression detection.
Develop automated quality signals at scale, monitor trends, and run experiments to quantify tradeoffs and inform decisions on model selection and pipeline architecture.

Statistics Python Data Storytelling

View details

20 jobs similar to Applied Data Scientist, LLM Evaluation

Jobs ranked by similarity.

Senior Software Engineer II - Applied AI and Evaluations

Smartsheet 11 days ago

$175,000–$245,000/yr

Own agent quality end-to-end: diagnosis, improvement, and validation across SmartAssist's orchestrator and subagents
Drive quality improvements through prompt engineering, context engineering, and RAG retrieval tuning
Extend and mature our evaluation framework: scorers, golden datasets, regression gates, and online evaluation for production traffic

Smartsheet has been helping people and teams achieve for over 20 years. They are building tools that empower teams to automate the manual, uncover insights, and scale smarter.

View details Similar jobs

Generative AI Data Analyst

Welo Data 24 days ago

$27–$27/hr

Creatively writing prompts and responses to a variety of diverse topics
Perform LLM annotation and evaluation tasks (ranking, scoring, labeling, tagging)
Evaluate model outputs for accuracy, relevance, and instruction-following

Welo Data is an AI services company that specializes in data annotation. They deliver high-quality training data transformation solutions for NLP-enabled machine learning by blending technology and human intelligence to collect, annotate, and evaluate all content types.

View details Similar jobs

Senior NLP / LLM Engineer

Social Discovery Group 15 days ago

Global 6w PTO

Conduct experiments with LLMs and evaluate different architectures and techniques to improve conversational AI quality.
Develop and maintain robust evaluation frameworks to assess model performance, accuracy, and user satisfaction using offline and online metrics.
Optimize models for inference, improving speed, efficiency, and scalability for production environments.

Social Discovery Group (SDG) unites millions of users on dozens of products, solving loneliness, isolation, and disconnection by transforming virtual intimacy into the new normal. Their international team of 1000+ professionals works remotely from various locations, and they've been recognized as a "Great Place to Work".

View details Similar jobs

Post-Training Research Scientist (LLMs) — Experimental Track

Vetto 18 days ago

Europe

Design and run post-training experiments on frontier and open-weight LLMs (SFT, preference-based methods, rubric-driven training)
Translate raw annotation artifacts (multi-step solutions, evaluations, adversarial prompts) into training-ready datasets.
Prototype new reward signals beyond pairwise preferences (rubrics, constraints, structured critics).

Vetto is a global talent platform connecting top-tier professionals to high-impact AI projects around the world. Their mission is to build trust, quality, and long-term value in the AI ecosystem - for both exceptional talents and companies operating at the frontier of technology.

View details Similar jobs

AI Engineer (LLMs & Generative AI)

Cadre AI 18 days ago

Design, implement, and evaluate machine learning models and AI algorithms.
Develop and optimize prompts for LLMs to improve model outputs.
Collaborate with software engineers, data scientists, and product teams.

Cadre AI is focused on building and optimizing AI-powered platforms, bringing together cutting-edge technologies and expertise in machine learning and large language models. The team is dedicated to advancing AI capabilities and applying them to real-world challenges through scalable, high-impact solutions.

View details Similar jobs

AI Engineer

Appflame 23 days ago

Global

Design and develop an AI-powered productivity analytics platform.
Build scalable LLM pipelines and create a meta-workflow system.
Develop system-level prompt engineering and build an evaluation framework for AI output quality control.

Appflame is a Ukrainian product-driven tech company committed to building world-class products. They have 500+ team members and offices in Kyiv, London, Limassol, and a co-working hub in Warsaw; they value bold, driven people who are passionate about building real products.

View details Similar jobs

AI Analytics Engineer (AI & Analytics Platform)

Airtable 17 days ago

$141,600–$193,600/yr

Build and maintain context infrastructure for AI tools.
Design and run evaluation frameworks for AI-generated insights.
Build and orchestrate AI agent systems for analytics tools.

Airtable is a no-code app platform empowering people to accelerate critical business processes. More than 500,000 organizations rely on Airtable to transform how work gets done, suggesting a large company size and a culture of innovation.

View details Similar jobs

Generative AI Data Analyst

Welo Data 24 days ago

$36–$36/hr

Creatively writing prompts and responses to a variety of diverse topics.
Leading labeling initiatives with third party firms and internal customers.
Creating and updating detailed guidelines and specifications for stakeholders.

Welo Data provides AI services, specifically data annotation. They enable brands and companies to reach, engage, and grow international audiences, delivering multilingual content transformation services in translation, localization, and adaptation.

View details Similar jobs

Principal AI Engineer

PointClickCare 15 days ago

$179,000–$199,000/yr

Set the technical vision and reference architecture for agentic AI across applications.
Build and govern reusable platform components to accelerate adoption across teams.
Drive cross-functional roadmaps and integration standards across OCIO and business teams.

PointClickCare helps providers deliver exceptional care. They are a leading health tech company that’s founder-led and privately held, empowering their employees to push boundaries, innovate, and shape the future of healthcare.

View details Similar jobs

PhD AI Research Intern

Latitude 8 days ago

Conduct fundamental LLM research using our SOTA story engine.
Create a benchmark for evaluating LLM behavior.
Deliver a benchmark library and a written report of compiled results.

Latitude is building the future of AI-native games by creating a platform where developers and creators can build entirely new kinds of interactive worlds. Latitude is a team of high-agency builders and storytellers who thrive on craft, curiosity, and community.

View details Similar jobs

Agent Development Engineer

Owkin 12 days ago

Europe

Lead Agent Development: Drive the development of Owkin’s Data Transformation Agent (DTA).
Orchestrate Data Workflows: Design, implement, and maintain complex data transformation workflows.
Ensure Code Excellence: Define and enforce robust engineering practices.

Owkin is an AI company on a mission to solve the complexity of biology. They are building the first Biology Super Intelligence (BASI) by combining powerful biological large language models, multimodal patient data, and agentic software.

View details Similar jobs

AI Engineer

Trellis Law 3 days ago

$150,000–$200,000/yr

US Unlimited PTO

Design features connecting natural language queries with a large corpus of legal knowledge.
Build a data architecture you are proud to highlight.
Use unstructured data to build large scale data sets.

Trellis Law is the leading provider of state trial court data in the U.S. They leverage AI and Machine Learning to analyze hundreds of millions of state trial court documents, transforming complex data into actionable insights. Founded in 2018, Trellis has experienced rapid growth and is now trusted by many of the nation’s largest law firms and corporate legal teams.

View details Similar jobs

Senior AI Engineer

SmartRecruiters 10 days ago

Global

Architect and build agentic workflows that combine large language models, reasoning components, and data pipelines to create adaptive, goal-driven conversational systems
Lead the design and development of advanced ML/NLP products, from ideation to production - including model training, evaluation, optimization, and deployment
Drive experimentation with new approaches for agentic reasoning, coordination, and autonomous system design

SmartRecruiters is the Recruiting AI Company that transforms hiring for the world’s leading enterprises. Built for global scale, SmartRecruiters, an SAP company, delivers an AI-powered hiring platform that automates and optimizes the entire talent acquisition process, ensuring faster and smarter hiring decisions. They are a values-driven, globally focused tech company with strong financial backing and a bold vision for the future of work.

View details Similar jobs

Prompt Engineer

Welo Data 1 day ago

Utilize Automatic Prompt Generation (APG) tools to create baseline prompts.
Run and supervise Automated Prompt Optimization (APO) tool.
Manually draft, test, and refine prompts to navigate complex template architectures.

Welo Data is an AI Services company. They specialize in data validation and AI solutions.

View details Similar jobs

AI Workflow & Automation Engineer (Temporary)

Global Strategy Group 15 days ago

$10,000–$15,000/mo

Global

Design and implement production-ready tools that integrate LLMs and other automation techniques into research workflows.
Write modular, testable, maintainable Python code grounded in strong engineering principles.
Build reproducible pipelines that transform raw data into structured, analysis-ready outputs, with validation and logging built in.

Global Strategy Group (GSG) is a leading public opinion research and communications firm working at the intersection of politics, policy, and public affairs. With a team of 150+ talented professionals, it protects and builds corporate reputations, influences public affairs decision makers, advocates on important social issues, and wins campaigns.

View details Similar jobs

Machine Learning Engineer

EX Squared LATAM 16 days ago

LATAM

Design and implement scalable ML infrastructure to support model development and deployment
Develop and maintain evaluation frameworks for Large Language Models (LLMs), including RAG-based systems
Evaluate model performance using tools such as RAGAS, DeepEval, or similar frameworks

EX Squared LATAM collaborates with global clients to build innovative digital solutions that drive real business impact. They foster a collaborative, inclusive, and innovation-driven culture where continuous learning and professional growth are at the core of everything they do.

View details Similar jobs

Principal Engineer, Applied AI

National Debt Relief 6 days ago

$171,000–$196,500/yr

Design, prototype, and deploy Generative AI solutions across client-facing and internal platforms.
Build and optimize applications using large language models (LLMs), vector databases, prompt engineering, and RAG pipelines.
Lead development of AI agents for both digital and voice channels, supporting real-time interactions with clients and internal users.

National Debt Relief, founded in 2009, aims to help consumers deal with overwhelming debt. They are a debt settlement organization that has helped over 450,000 people settle over $10 billion of debt, striving to empower them to lead a healthier financial lifestyle.

View details Similar jobs

AI Engineer (Europe)

Hiflylabs 23 days ago

Europe

Own the architecture and delivery of production-grade LLM systems and classical ML solutions.
Design, evaluate, and optimize RAG pipelines (retrieval strategy, chunking, indexing, monitoring).
Build scalable, production-grade LLM services and agentic workflows, alongside traditional ML systems where appropriate.

Hiflylabs is a team of 250+ data and tech enthusiasts based in Budapest. They focus on data engineering, data science, artificial intelligence and application development, working on a wide range of projects around the world. Hiflylabs values its people and is committed to nurturing their personal and professional development through a mentoring system.

View details Similar jobs

Principal Full-Stack Engineer, Experience Team

ServiceNow 9 days ago

$199,100–$348,400/yr

North America Canada

Architect and build automation pipelines that replace high-volume, repeatable content tasks.
Design and develop LLM-powered tooling that enables agentic content creation workflows.
Build and maintain integrations across ServiceNow’s content platforms, knowledge management systems, and AI services.

ServiceNow started in 2004 and stands as a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500(R). Their intelligent cloud-based platform seamlessly connects people, systems, and processes to empower organizations.

View details Similar jobs

Applied AI Engineer

Smart Working 30 days ago

Global

Design and implement end-to-end AI solutions for document understanding and automated report generation.
Build and deploy LLM-based systems, including RAG pipelines, to retrieve and combine context from multiple data sources.
Work with unstructured and semi-structured data such as PDFs, documents, images, and historical records, transforming it into usable inputs for AI systems.

Smart Working believes your job should not only look right on paper but also feel right every day. They aim to connect skilled professionals with outstanding global teams and products for full-time, long-term roles in a genuine community that values growth and well-being.

View details Similar jobs

Source Job