Source Job

  • Benchmark FP8 quantization across GPU families and ship a production config to achieve speedup.
  • Evaluate serving frameworks with speculative decoding to improve performance.
  • Build a fine-tuning pipeline to enable faster model training and deployment.

Python LLM CUDA

20 jobs similar to Model Performance Engineer

Jobs ranked by similarity.

APAC

  • Partner directly with customer engineering teams running training and inference workloads in production.
  • Investigate failures involving distributed training, Kubernetes orchestration, GPU allocation, networking, and storage systems.
  • Identify recurring patterns across customer issues and drive long term reliability improvements.

Lightning AI is the company behind PyTorch Lightning, building an end-to-end platform for developing, training, and deploying AI systems. They serve solo researchers, startups, and large enterprises, operating globally with offices in New York City, San Francisco, Seattle, and London.

Global

  • Own and operate GPU and accelerator clusters for AI training, inference, and experimentation, ensuring reliability and cost-efficiency.
  • Build and optimize scheduling, orchestration, and serving systems using frameworks like vLLM and Triton to improve latency, throughput, and memory efficiency.
  • Partner with ML engineers to remove workflow bottlenecks and build observability for GPU utilization, capacity, and incident response.

Kraken is a crypto exchange platform building premium financial products for traders and institutions, accelerating global crypto adoption. It is a mission-driven, fully remote company with a world-class team of crypto experts spread across more than 70 countries.

$100,000–$200,000/yr
Global

  • Improve prompts, model selection, and tool usage so the system gets more decisions right over time.
  • Reduce latency, token usage, and cost while preserving decision quality and operational reliability.
  • Design validation, retries, and human review paths for ambiguous, adversarial, incomplete, or conflicting inputs.

Risk Labs is the core team behind UMA and Across, building infrastructure that pushes crypto forward. They value ownership, curiosity, thoughtful risk-taking, and direct communication.

US

  • Own the messaging and content that defines MinIO's role in the NVIDIA AI Factory across NVIDIA products.
  • Develop the technical positioning and content for MinIO's integrations with NVIDIA technologies.
  • Build solutions content that shows how MinIO and NVIDIA infrastructure solve specific customer problems.

MinIO is the industry leader in high-performance object storage. It is the company behind the world’s fastest, most widely deployed object store, powering production infrastructure for more than half of the Fortune 500. The enterprise offering, AIStor, is engineered to handle the scale, speed, and pressure of modern AI and analytics, from terabytes to exabytes, all in a single namespace.

US

  • Identify high-leverage opportunities where AI improves customer outcomes, not where it’s trendy.
  • Ship working prototypes, not slide decks; move in days, not quarters.
  • Define success metrics tied to customer value, and then hold yourself to them.

Clipboard's mission is to uplift as many communities as possible through an app-based marketplace connecting healthcare professionals with workplaces. Founded in 2016, they are a remote-first team of over 1,000 people and a top Y-Combinator company, profitable since 2022.

ML Researcher

Fal

  • Spot products and features that are missing in the current market.
  • Work backwards to develop new methods to solve customers problems.
  • Consider the expected return on investment of different approaches.

Fal is the generative media ecosystem powering the next generation of AI products. Fal builds the infrastructure, tools, and model access that teams need to move from idea to production, and do it at scale without compromise.

Poland

  • Design and deploy GPU cluster architectures using tools like Ansible, Terraform, Kubernetes, and Slurm.
  • Lead technical deep-dives, workshops, and present solutions to stakeholders, translating complex concepts.
  • Automate provisioning and monitoring with Infrastructure as Code, and produce documentation and training materials.

Gcore is a global provider of infrastructure and software solutions for AI, cloud, network, and security, powering digital experiences worldwide. The company collaborates with leading technology partners and employs over 550 professionals building foundational technologies.

$160,000–$240,000/yr
US

  • Build agentic AI systems that change how Dataiku runs internally.
  • Turn real problems into working software.
  • See your solutions through from first conversation to production.

Dataiku is the Platform for AI Success, the enterprise orchestration layer for building, deploying, and governing AI. The world’s leading companies rely on Dataiku to operationalize AI and run it as a true business performance engine delivering measurable value.

US

  • Shape technical direction and architecture: Define the foundational architecture for enterprise agentic AI at Benchling.
  • Build and ship the early portfolio yourself: Write production code at least half your time, particularly during the team's first year.
  • Design for enterprise from day one: Build for multi-tenant isolation, secrets management, audit logging, payload encryption, role-based access controls, and human-in-the-loop controls calibrated to risk.

Benchling is the AI platform for biotech R&D. Scientists use Benchling to design experiments, capture structured data, and run AI agents and models directly in their workflows. They have over 200,000 scientists around the world, from academic labs to Sanofi and Moderna.

$180,000–$240,000/yr
US

  • Be part of the alignment research team, working on projects selected for their high upside potential and under-resourced status.
  • Do real alignment research with real autonomy, in directions most organizations aren’t set up to pursue.
  • Break complex problems into concrete experiments and execute on them, independently or with a team.

AE Studio is a 160-person, fully bootstrapped ML consultancy that has spent over a decade building and shipping AI systems for clients. Without outside investors, we put money into alignment research through the AI Alignment Foundation, a nonprofit we founded to scale this work.

$180,000–$225,000/yr
US

  • Instrument fal's core infrastructure to capture CPU, GPU, and request-level signals.
  • Build ingestion pipelines from partner APIs, compute vendors, and internal services into BigQuery.
  • Design and operate the ETL backbone that powers cost, margin, and usage analytics.

Fal is the generative media ecosystem powering the next generation of AI products. They build the infrastructure, tools, and model access that teams need to move from idea to production at scale.

$91,250–$127,750/yr
Canada

  • Develop AI systems that automate dispute and chargeback handling using structured evidence and business logic, creating a better experience for our customers.
  • Build models that automate refunds, getting money back to our customers faster.
  • Build and maintain evidence extraction pipelines that process unstructured data using LLM-powered workflows to produce structured, actionable outputs.

Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest. They are a remote-first company with competitive benefits and focus on an inclusive interview experience.

Europe North America 7w PTO

  • Design a Python framework for implementing internal and public benchmarks.
  • Build and maintain a pipeline that runs distributed evaluations at scale.
  • Collaborate with modeling and product teams to improve experimentation and evaluation tooling.

Poolside aims to be the leading company in building a world where AI drives economically valuable work and scientific progress. They are a remote-first team across Europe and North America, gathering monthly in person for 3 days and twice a year for longer offsites.

North America

  • Architect the migration of the existing compiler flow into MLIR, defining dialects, passes, and lowering strategies.
  • Build conversion paths between MLIR and Mythic’s custom low-level IR to keep both flows operational during migration.
  • Define validation infrastructure within MLIR, including interpretation or execution paths for simulation and debugging.

Mythic is building the future of AI computing with breakthrough analog technology that delivers high performance at low power and cost. They have raised over $100M from world-class investors and secured multi-million-dollar customer contracts across multiple markets.

US

  • Owns the technical direction for large-scale machine learning models, guiding the development of advanced deep learning architectures and high-impact ML systems.
  • Partners with leadership to define ML roadmaps, drive innovation in scalable model design and training approaches.
  • Ensures efficient, reliable deployment of ML models in production and mentors the team’s technical capabilities.

Reddit is a community-driven platform where users submit, vote, and comment on topics of interest. With over 100,000 active communities and approximately 126 million daily active unique visitors, it is one of the internet’s largest sources of information.

India

  • Design and ship agentic systems and multi-step LLM workflows using Claude, OpenAI, or equivalent - including tool use, memory, structured output extraction, and failure handling.
  • Build and maintain MCP integrations connecting internal tools, portco systems, and external data sources into reliable, observable pipelines.
  • Write production-grade Python for data pipelines, integration scripts, and scheduled jobs running via BullMQ-backed queues on the Node/TypeScript stack.

Emergence is a PE holdco backed by the Pritzker Organization focused on acquiring and scaling B2B SaaS businesses. It combines operational rigor with a growth equity mindset to drive ARR growth and profitability across its portfolio.

$194,000–$228,000/yr
US

  • Design, build, and ship LLM-powered features and agentic workflows for Gametime users.
  • Build and maintain evaluation frameworks and prompt testing pipelines for AI-powered experiences.
  • Contribute to orchestration layer, including agent routing, tool use, and multi-step workflow coordination.

Gametime helps people connect through shared live experiences. They operate platforms on iOS, Android, mobile web, and desktop, supporting over 60,000 events across the US and Canada, fostering a collaborative and inclusive environment where diverse perspectives are valued.

US

  • Work directly with business and technical stakeholders to identify high-value AI use cases and translate business problems into executable technical solutions.
  • Design and build enterprise-grade Claude enabled applications, agentic workflows, workflow copilots, knowledge assistants, and decision-support systems.
  • Help enterprise clients rationalize Claude licensing types, evaluate usage models, and design an overall licensing strategy aligned to adoption, governance, cost management, and business value.

Aimpoint Digital is a market-leading data, AI, analytics, and operations research advisory and solution engineering firm. They help organizations design, build, and operationalize enterprise-grade data and AI platforms, decision intelligence solutions, optimization systems, and production AI applications.

$3,850–$3,850/yr
US UK Canada

  • Fellows will use external infrastructure to work on an empirical project aligned with research priorities.
  • Projects aim to produce a public output, such as a paper submission.
  • Fellows receive mentorship and can access a shared workspace in Berkeley or London.

Anthropic's mission is to create reliable, interpretable, and steerable AI systems. Their team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

$15–$15/hr
US

  • Identify and label languages and dialects from model-generated responses.
  • Review outputs from two different AI models and determine which model correctly identified the proposed language.
  • Compare model responses and select the appropriate evaluation outcome from predefined options

RWS – TrainAI is looking for Language Data Annotators. They embrace DEI and promotes equal opportunity and prohibits discrimination and harassment of any kind.