Source Job

Europe

  • Design and run post-training experiments on frontier and open-weight LLMs (SFT, preference-based methods, rubric-driven training)
  • Translate raw annotation artifacts (multi-step solutions, evaluations, adversarial prompts) into training-ready datasets.
  • Prototype new reward signals beyond pairwise preferences (rubrics, constraints, structured critics).

Python PyTorch JAX ML AI

20 jobs similar to Post-Training Research Scientist (LLMs) — Experimental Track

Jobs ranked by similarity.

Europe

  • Completing AI training tasks such as analyzing, editing, and writing Python
  • Judging the performance of AI in performing Python-related prompts
  • Improving cutting-edge AI models

Prolific is building the biggest pool of quality human data in the world. Over 35,000 AI developers, researchers, and organizations use Prolific to gather data from paid study participants with a wide variety of experiences, knowledge, and skills.

Europe Unlimited PTO

  • Contribute to designing, evaluating, and shipping our mental health AI Agent and its supporting infrastructure.
  • Develop and maintain robust data pipelines to power model training and evaluation.
  • Partner with AI Research, Product, and Engineering teams to define new features.

Sword Health is shifting healthcare from human-first to AI-first through its AI Care platform. They aim to make world-class healthcare available anytime, anywhere, while significantly reducing costs. Backed by clinical studies and patents, Sword Health has raised more than $500 million from leading investors.

Europe

  • Contribute to the entire development cycle of our cutting-edge large deep learning models.
  • Collaborate across engineering and clinical teams to translate cutting-edge AI research into practical applications that have clinical implications.
  • Influence progress of relevant research communities by producing publications.

Sword Health is shifting healthcare from human-first to AI-first through their AI Care platform, making world-class healthcare available anytime, anywhere, while significantly reducing costs. Sword began by reinventing pain care with AI at its core, and has since expanded into women’s health, movement health, and more recently mental health.

Australia

  • Design and optimise AI-ready tools and APIs that enable LLM platforms to reliably interact with Canva's design capabilities.
  • Build and maintain evaluation frameworks to systematically measure tool-use accuracy across platforms.
  • Experiment with LLM orchestration and agent architectures – Develop Canva agents that any 3rd party provider can call to design quickly, efficiently and at scale.

Canva is a platform redefining how the world experiences design. They have a flagship campus in Sydney, with a second campus in Melbourne and co-working spaces in Brisbane, Perth, Adelaide, and Auckland, NZ.

$35–$50/hr
Global

  • Design and implement LLM-powered application workflows
  • Architect retrieval-augmented generation pipelines
  • Collaborate with backend architects to integrate AI services into APIs

They are seeking a hands-on AI Engineer with deep expertise in Large Language Model integration and production AI systems. The company's culture sounds innovative and collaborative, focusing on building scalable and secure AI applications.

  • Turn complex scientific processes into intelligent, automated workflows.
  • Design and implement end-to-end workflows connecting laboratory devices, APIs, datasets, and AI models.
  • Build conversational agents and intelligent bots to assist scientific teams in their daily work.

E184 builds technology for goals that matter most, working in labs around the world. They enable reproductive rights, build brain-computer interfaces, and aim to give everyone a say in the future.

Global

  • Design and develop an AI-powered productivity analytics platform.
  • Build scalable LLM pipelines and create a meta-workflow system.
  • Develop system-level prompt engineering and build an evaluation framework for AI output quality control.

Appflame is a Ukrainian product-driven tech company committed to building world-class products. They have 500+ team members and offices in Kyiv, London, Limassol, and a co-working hub in Warsaw; they value bold, driven people who are passionate about building real products.

$36–$36/hr
US

  • Creatively writing prompts and responses to a variety of diverse topics.
  • Leading labeling initiatives with third party firms and internal customers.
  • Creating and updating detailed guidelines and specifications for stakeholders.

Welo Data provides AI services, specifically data annotation. They enable brands and companies to reach, engage, and grow international audiences, delivering multilingual content transformation services in translation, localization, and adaptation.

Global 6w PTO

  • Collaborate with Cohere’s Modelling Safety team on implementing novel research ideas.
  • Conduct cutting-edge machine learning research, training and evaluating production large language models.
  • Focus on research projects aimed at making models better understood, safer, more reliable, more inclusive, and more beneficial for the world

Cohere scales intelligence to serve humanity. They train and deploy frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search and agents. They are a team of researchers, engineers, and designers.

  • Design, implement, and evaluate machine learning models and AI algorithms.
  • Develop and optimize prompts for LLMs to improve model outputs.
  • Collaborate with software engineers, data scientists, and product teams.

Cadre AI is focused on building and optimizing AI-powered platforms, bringing together cutting-edge technologies and expertise in machine learning and large language models. The team is dedicated to advancing AI capabilities and applying them to real-world challenges through scalable, high-impact solutions.

$27–$27/hr
US

  • Creatively writing prompts and responses to a variety of diverse topics
  • Perform LLM annotation and evaluation tasks (ranking, scoring, labeling, tagging)
  • Evaluate model outputs for accuracy, relevance, and instruction-following

Welo Data is an AI services company that specializes in data annotation. They deliver high-quality training data transformation solutions for NLP-enabled machine learning by blending technology and human intelligence to collect, annotate, and evaluate all content types.

Europe

  • Build and deploy AI models with RAG and tool calling for various product features.
  • Collaborate with frontend and backend engineers to bring AI features into production.
  • Stay updated with the latest AI research and propose innovative applications relevant to Finom’s mission.

Finom is a European tech startup headquartered in Amsterdam, revolutionizing financial services for entrepreneurs. They offer an all-in-one B2B financial solution integrating banking, accounting, financial management, and invoicing into a seamless, mobile-first platform, actively expanding across key EU markets.

US

  • Design and implement production-grade RAG pipelines and agentic workflows using Python.
  • Evaluate new models and prototype approaches for SBIR/government deliverables.
  • Document architectures and contribute to technical reports for contract deliverables.

Unstructured is focused on transforming unstructured data into a format usable by LLMs. Their Public Sector team works on high-impact contracts and seek to bridge the gap between custom builds and a scalable product roadmap.

Global

  • Manages complex, strategic AI training data projects.

Welo Data works with technology companies to provide datasets that are high-quality, ethically sourced, relevant, diverse, and scalable to supercharge their AI models. As a Welocalize brand, WeloData leverages over 25 years of experience in partnering with the world’s most innovative companies and brings together a curated global community of over 500,000 AI training and domain experts to offer services.

Europe

  • Own the architecture and delivery of production-grade LLM systems and classical ML solutions.
  • Design, evaluate, and optimize RAG pipelines (retrieval strategy, chunking, indexing, monitoring).
  • Build scalable, production-grade LLM services and agentic workflows, alongside traditional ML systems where appropriate.

Hiflylabs is a team of 250+ data and tech enthusiasts based in Budapest. They focus on data engineering, data science, artificial intelligence and application development, working on a wide range of projects around the world. Hiflylabs values its people and is committed to nurturing their personal and professional development through a mentoring system.

Europe

  • Build and ship AI-powered product and internal solutions using LLMs, RAG, tool calling, workflows, and agentic patterns
  • Design quality and evaluation frameworks for AI systems, including offline evals, online signals, failure analysis, and continuous improvement loops
  • Contribute to AI platform and tooling decisions that improve reuse, speed, and consistency across teams

Finom is a European tech startup headquartered in Amsterdam, revolutionizing financial landscape for entrepreneurs. They develop an all-in-one financial B2B solution integrating banking, accounting, financial management, and invoicing into a mobile-first platform and nurture innovation in an inspiring work environment.

$90,000–$160,000/yr
US Unlimited PTO

  • Design, develop, and refine large language model workflows to steer and improve model behaviors.
  • Build language processing components for intent detection, summarization and conversational response quality.
  • Drive R&D-style exploration on cutting-edge speech and language systems, rapidly prototyping novel approaches.

Cresta's platform combines AI and human intelligence to help contact centers discover customer insights and behavioral best practices, automate conversations, and empower team members. They are led by founders with experience at Google, Waymo, and Open AI, and are on a mission to revolutionize the workforce with AI.

Global

  • Develop and iterate realistic prompts to test the relevance and quality of AI-generated insights.
  • Systematically evaluate divergence between professional real estate judgment and AI outputs across asset classes/risk profiles.
  • Translate how REPE professionals evaluate acquisitions into problems that push the limits of AI reasoning.

Mentis AI operates at the intersection of institutional investment expertise and frontier AI systems. Their team combines asset management experience with machine learning and applied AI research, collaborating with leading AI labs to improve how models reason and make decisions in financial contexts.

US Canada

  • Bring deep expertise in machine learning and applied AI to turn emerging techniques into practical solutions.
  • Provide broad technical leadership across teams while remaining hands-on in applied research and innovation.
  • Guide major technical decisions, identify opportunities for differentiation, and translate new ideas into future product capabilities.

Kinaxis is a global leader in modern supply chain orchestration that powers complex global supply chains and supports the people who manage them. They have grown to become a global organization with over 2000 employees around the world, with 6 global offices and a best-in-class HQ in Ottawa, Canada.

$175,000–$300,000/yr
US

  • Plan, design, and run experiments to evaluate and refine deep learning architectures and training methodologies.
  • Research, implement, and adapt new algorithms and model architectures from recent ML papers.
  • Collaborate closely with talented engineers to translate research insights into robust, deployable models.

Compound Eye enables machines to understand their surroundings in 3D and in real time using only passive sensors. They have customers in automotive, agriculture, healthcare, and defense and is backed by Khosla Ventures and other leading investors, with a team of sixteen in the US. Their culture is based on transparency, mutual respect and accountability.