Source Job

US

  • Design and evaluate reinforcement learning systems for agentic AI workflows, including RL environments, reward models, and post-training pipelines for LLM-based agents.
  • Develop simulation environments, reward functions, and evaluation frameworks for enterprise workflows.
  • Collaborate with researchers to translate research into practical enterprise solutions, with opportunities to publish and present findings.

Reinforcement Learning Python PyTorch LLMs

14 jobs similar to PhD Research Intern - Applied Reinforcement Learning

Jobs ranked by similarity.

US Unlimited PTO 12w maternity 6w paternity

  • Design, implement, and evaluate reinforcement learning algorithms for robotic control and motion planning.
  • Develop sim-to-real pipelines using simulation environments like Isaac Gym and MuJoCo.
  • Collaborate with cross-functional teams to deploy RL policies on physical robots.

Path Robotics builds AI-driven robotic systems that adapt and learn to close the skilled labor gap. The company is a growing team of intelligent, humble, and driven people who make the impossible possible together.

Switzerland

  • Conduct advanced research on agentic AI systems trained on real-world interaction data.
  • Design and experiment with learning frameworks such as RAG, fine-tuning, RLHF, DPO, and GRPO.
  • Develop multimodal representation learning approaches across text, audio, logs, and structured data.

Our partner is a global AI research organization focused on developing cutting-edge agentic and multimodal AI systems. It offers a collaborative environment with top-tier engineering and product teams.

Canada

  • Design and implement multi-agent AI systems using frameworks like LangChain and CrewAI, building agent-to-agent orchestration pipelines.
  • Fine-tune foundation models, integrate retrieval-augmented generation, and develop APIs and backend services for production deployment.
  • Containerize and deploy agents with Docker and Kubernetes, while collaborating with QA and product teams to benchmark accuracy and safety.

Innodata is a global data engineering company focused on enabling the responsible advancement of artificial intelligence by providing data, evaluation frameworks, and human expertise. With over 36 years of experience, the company delivers high-quality data solutions and services for Generative AI builders and adopters.

Global

  • Prototype and improve LLM-based features for tasks such as entity extraction, summarization, and document comparison.
  • Develop and evaluate prompt strategies, embedding approaches, and semantic search workflows.
  • Collaborate with engineers and product managers to integrate your work into the live platform.

Outmarket is building the AI-powered operating system for commercial insurance. Our team brings experience from Uber, Meta, Adobe, and IBM Watson, and we move quickly from idea to deployment.

Canada

  • Design and ship agentic workflows that plan, act, and re-plan in a loop, and build retrieval-augmented generation (RAG) pipelines that ground AI in real data.
  • Select the right model for each task, engineer prompts and context structures, and write evaluations to prove AI system correctness.
  • Connect AI tools to internal systems via APIs or MCP, review AI-generated code for correctness and security, and collaborate with the team to deliver complete solutions.

Vosyn is a trailblazing Language Synthesis AI firm that dissolves language barriers and empowers users through innovative AI solutions. Currently preparing for a significant IPO, the company fosters a culture of flexibility, continuous improvement, and solution-focused strategies where every idea is welcomed and nurtured.

India

  • Architect and ship production-grade agentic AI applications including multi-agent orchestration, retrieval systems, and evaluation pipelines.
  • Design and build learner-facing AI experiences and operator tools end-to-end using React and TypeScript.
  • Own production reliability for AI systems including model failover, rate limiting, cost monitoring, and incident response.

Chegg Skills builds applications that help motivated career switchers transition into high-growth roles. The company serves thousands of learners and educators each year through a high-ownership engineering team rethinking modern education.

Global 6w PTO

  • Train and fine-tune language models powering AI companions and own agent harnesses, agentic loops, and chat interface algorithms.
  • Build and maintain the full LLM stack from model training to production deployment while tracking cutting-edge NLP research.
  • Collaborate with validation, content, and dataset preparation teams to design experiments and measure model quality.

Social Discovery Group is one of the world's largest groups of social discovery companies, solving loneliness and disconnection through social entertainment platforms like DateMyAge and Dating.com. The international team of 1000+ professionals works remotely worldwide and is a two-time 'Great Place to Work' winner.

United States

  • Build and evolve the agent harness and orchestration that turns an LLM into a reliable autonomous pentester.
  • Design tools and validation layers to keep the agent reliable, with structured outputs and production-safety.
  • Own and grow evaluation infrastructure to measure and drive agent improvements.

Horizon3.ai is a fast-growing remote cybersecurity company that provides autonomous penetration testing through its NodeZero platform. The company fosters a culture of respect, collaboration, and ownership, with a team of former cyber operators and engineers.

United States 6w PTO

  • Train, fine-tune, and optimize large language models powering AI companion and conversational systems at scale.
  • Design and maintain agentic frameworks and LLM orchestration systems, including reasoning loops and chat orchestration.
  • Research state-of-the-art NLP techniques and implement alignment methods such as RLHF and DPO to improve model quality.

We are an AI-powered job matching platform that connects candidates with hiring companies through objective, fair review processes. As a globally distributed, innovation-focused company, we foster a collaborative engineering culture with continuous learning opportunities.

US

  • Design and build AI applications powered by LLMs, RAG, semantic search, and AI agents.
  • Develop intelligent pipelines for document ingestion, knowledge extraction, and retrieval.
  • Build and optimize production AI services using Python and modern AI frameworks.

Implicit builds a leading AI Knowledge Engine for Maintenance and Support. The company is a small, experienced, and highly technical team tackling challenging real-world problems across defense, manufacturing, and customer support.

US

  • Develop and operate production-ready AI and machine learning systems for enterprise-scale products.
  • Build and optimize LLM-powered applications, RAG pipelines, and intelligent agents.
  • Implement software engineering best practices for AI development including CI/CD and testing.

Our partner is building enterprise-grade AI solutions that deliver measurable business impact. They offer a remote-friendly work environment with a collaborative engineering culture focused on innovation, quality, and continuous learning.

US

  • Build, ship, and own product features end-to-end using cutting-edge AI/ML techniques.
  • Apply classical ML and LLM-based approaches like RAG, prompt engineering, and fine-tuning to enhance the audit and risk platform.
  • Collaborate with cross-functional teams in an Agile environment to deliver scalable, production-quality code.

Optro is a leading audit, risk, ESG, and InfoSec platform trusted by over 50% of the Fortune 500. The company has been named one of the 500 fastest-growing tech companies in North America for seven consecutive years, fostering a culture of innovation and collaboration.

Canada US

  • Own customer solutions end-to-end, rapidly prototyping and deploying solutions in live operational environments.
  • Build trusted relationships from IC level to executive sponsor, becoming the technical face of the company.
  • Operate as part of a tight, multi-disciplinary unit with focus and urgency, seamlessly trading tasks to whoever is closest to the skills needed.

Kinaxis is a global leader in modern supply chain orchestration, powering complex global supply chains with an AI-infused platform. With over 2000 employees worldwide and 6 global offices, it has been recognized with several Top Employer awards and fosters a culture focused on technology, customers, and innovation.

US Canada

  • Investigate and apply novel algorithms for generation and editing of video, sound, and 3D visual geometry, including 3D human motion.
  • Analyze and alleviate ethical flaws in generative models, focusing on memorization detection, concept erasure, and data attribution.
  • Publish findings in top-tier conferences and receive support from internal scientists and engineers.

Sony AI is Sony's research organization using AI to unleash human creativity, collaborating with business units like Sony Interactive Entertainment and Sony Music. With some 900 million Sony devices worldwide and a vast entertainment portfolio, it delivers experiences globally.