Source Job

US

  • Analyze ML models to identify and resolve performance bottlenecks.
  • Incorporate OSS tools to enable ML engineers self-sufficiently profile and optimize models.
  • Deliver solutions to streamline model deployment across various hardware platforms.

C++ Python CUDA PyTorch

20 jobs similar to Staff Software Engineer, ML Acceleration

Jobs ranked by similarity.

Europe US

  • Design and build training pipelines, fine-tuning workflows, and RL infrastructure.
  • Implement data ingestion and curation systems, inference services, and scalability and backend architecture.
  • Own the platform that turns models into production systems.

Fastino is building the next generation of LLMs, with a team of alumni from Google Research, Apple, Stanford, and Cambridge. Fastino's GLiNER family of open source models has been downloaded more than 5 million times and is used by companies such as NVIDIA, Meta, and Airbnb.

Canada

  • Design and maintain training systems that can process and learn from petabyte-scale multimodal datasets.
  • Identify and resolve bottlenecks in the training pipeline to maximize GPU utilization and reduce training time.
  • Work with the ML team to develop and refine neural network architectures suitable for autonomy tasks.

Serve Robotics is reimagining how things move in cities. Their personable sidewalk robot is their vision for the future; it's designed to take deliveries away from congested streets, make deliveries available to more people, and benefit local businesses. Their team is agile, diverse, and driven aiming to grow robotic deliveries from surprising novelty to efficient ubiquity.

Europe

  • Build scalable Edge infrastructure, designing and maintaining delivery systems for model deployment.
  • Work with cross-functional teams to integrate complex features, translating research into hardware realities.
  • Drive automation and reliability by implementing infrastructure to test models and monitor performance.

Hudl builds great teams and hires the best to ensure employees are working with people they can constantly learn from. They provide a culture where everyone feels supported, becoming one of Newsweek's Top 100 Global Most Loved Workplaces.

Europe Unlimited PTO

  • Design, build, and maintain the inference infrastructure that powers Sword Health's AI products.
  • Own the end-to-end deployment pipeline for AI models, from real-time computer vision to large language models.
  • Architect and scale Kubernetes clusters for GPU-accelerated workloads, including autoscaling strategies and resource scheduling.

Sword Health is shifting healthcare from human-first to AI-first through its AI Care platform. They make world-class healthcare available anytime, anywhere, while significantly reducing costs. Sword Health has over 1,000 enterprise clients and has raised more than $500 million from leading investors.

$153,200–$183,300/yr
US Canada

  • Develop and train deep learning models for camera-based perception.
  • Implement production-quality machine learning code to support model training, evaluation, and inference.
  • Analyze model performance across diverse driving scenarios to improve robustness and generalization.

Torc Robotics focuses on developing self-driving vehicle technology. They aim to make roads safer and improve lives by commercializing autonomous trucks, offering advanced driver assistance systems and self-driving solutions, with a focus on safety and reliability.

US Canada

  • Develop and train machine learning models for learned behavior systems.
  • Implement production-quality ML code to support model training, evaluation, and inference.
  • Analyze model performance, identify failure modes, and propose improvements to increase robustness.

Torc Robotics develops behavior models that power decision-making for autonomous trucks. While the job posting does not provide specific information regarding company size, it can be inferred that they foster a highly collaborative environment.

US Canada

  • Develop and train computer vision and deep learning models for road‑lane detection using monocular and multimodal sensor data.
  • Build 3D road surface and lane geometry models in BEV space and integrate them into Torc’s autonomy pipeline.
  • Analyze model performance, identify corner cases, and improve robustness under diverse environmental and long‑tail conditions.

Torc Robotics focuses on developing autonomous driving technology, particularly for trucking applications. They aim to create safe and reliable autonomous systems. The company fosters collaboration between expert teams to push the boundaries of autonomous vehicle capabilities.

US Unlimited PTO

  • Design, build, and maintain ML infrastructure across training, evaluation, serving, and monitoring
  • Own data pipelines including generation, cleaning, validation, and versioning
  • Build and improve experiment tracking, orchestration, and reproducibility tooling

Quilter is helping electrical engineers save time and accomplish more by automating the tedious and time-consuming task of designing printed circuit boards (PCBs). Their small team is composed of experts in electrical engineering, electromagnetic simulation, ML/AI, and high-performance computing (HPC).

$190,000–$240,000/yr
US

  • Scope and lead ML initiatives end-to-end from identifying opportunities through production deployment.
  • Design, develop, and optimize ML models and AI systems for document processing and automation.
  • Build and maintain production ML pipelines that are robust, observable, and scalable.

Medallion is a healthcare technology company building a provider operations platform to eliminate administrative bottlenecks. They are one of the fastest-growing healthcare technology companies, with a mission to transform healthcare at scale and are backed by $130M in funding.

South Africa

  • Develops software using the KnowBe4 Software Development Lifecycle and Agile Methodologies.
  • Designs, develops, and researches Machine Learning systems.
  • Transforms data science prototypes by applying appropriate Machine Learning algorithms and tools.

KnowBe4 is a global leader in Human Risk Management, trusted by over 70,000 organizations worldwide. They secure employees and AI agents and pioneer a new era of security with AI-powered and market-leading solutions, combining risk intelligence, technical defenses, and personalized training.

$150,000–$200,000/yr
US

  • Evaluate and select GPU computing technologies and frameworks.
  • Design and implement the GPU computing layer within our desktop software stack.
  • Port mesh processing algorithms and optimize components from CPU implementations to GPU.

Velo3D enables on-demand manufacturing of production quality metal parts with design freedom and quality control. They are an award-winning solution company that believes in transparency and recognizing exceptional efforts, with some company benefits including healthcare coverage and 401(K) employer contributions.

US Canada

  • Develop and train machine learning models for scene understanding.
  • Implement production-quality ML code to support model training.
  • Analyze model performance, identify failure modes, and propose improvements.

Torc Robotics is dedicated to developing autonomous driving technology. They aim to revolutionize the trucking industry with safe and efficient self-driving solutions.

Australia

  • You’ll design, build, and maintain scalable systems for serving machine learning models in production.
  • You’ll optimise inference performance, including latency, throughput, and cost efficiency.
  • You’ll collaborate with ML researchers and engineers to productionise models

Canva is a design platform that enables users to create a variety of visual content. They have campuses in Sydney and Melbourne, with co-working spaces in other Australian cities, and promote a flexible work environment.

$150,000–$180,000/yr
US

  • Design and implementation of reliable, maintainable, and scalable GenAI systems.
  • Serve as a subject matter expert for machine learning systems owned by the team.
  • Mentor junior and mid level engineers through code reviews and design collaboration.

Trajector specializes in medical evidence services, guiding clients through disability benefits complexities. They are a global team of over 1,800 dedicated individuals, streamlining the path to benefits and ensuring access to rightful compensation for those with disabilities.

US

  • Build and deploy end-to-end AI/ML solutions, from data pipelines and feature engineering to model training and inference
  • Develop and maintain data pipelines for ingesting, transforming, and preparing data for analytics and machine learning
  • Write clean, modular, and maintainable code to support scalable AI applications

Eimagine fosters a remote-enabled environment where their people can thrive. They are a team of professionals who take pride in their craft, continuously learn, and support one another, helping clients navigate technology and business change while delivering meaningful outcomes.

Australia

  • Developing ranking and recommendation models that identify high-performing team designs.
  • Building brandification pipelines to conform to an organisation's brand guidelines.
  • Building layout extraction and understanding systems that parse Canva's design format.

Canva is a design platform that makes it easy for anyone to create professional-looking designs. They have a flagship campus in Sydney, a second campus in Melbourne, and co-working spaces in Brisbane, Perth, & Adelaide, and provides flexibility in how and where you work.

US

  • Design and maintain robust ML deployment pipelines to ensure seamless model delivery.
  • Automate model training, deployment, and monitoring workflows to increase operational efficiency.
  • Collaborate closely with Data Scientists and Engineering teams to integrate models into production environments.

Truelogic is a leading provider of nearshore staff augmentation services, headquartered in New York. With over 600+ highly skilled tech professionals based in Latin America, they drive digital disruption by partnering with U.S. companies on their most impactful projects.

Global

  • Design and evolve multi-provider, multi-region GPU compute clusters optimized for large-scale training.
  • Serve as the primary technical point of contact for customers running large-scale training workloads.
  • Build production-grade automation for cluster provisioning, GPU health checks, job scheduling, self-healing, and firmware/driver lifecycle management.

Andromeda Cluster gives early-stage startups access to scaled AI infrastructure. They work with leading AI labs, data centers, and cloud providers to deliver compute when and where it’s needed most and are expanding to find the brightest in AI infrastructure, research and engineering.

US

  • Design and Develop machine learning infrastructure, tooling, and models to help teams deliver world class experiences.
  • Help product and development teams understand the data lifecycle and the inherent experimental nature of machine learning.
  • Build internal products and platforms to enable teams to incorporate AI into their features and customer facing products.

Weave provides an all-in-one platform for small businesses to streamline communications, and patient experiences. The company has a phenomenal culture, and Weave's teams are cross-functional agile teams composed of a product owner, backend and frontend devs and devops.

North America Unlimited PTO

  • Design and implement AI-powered systems using a mix of classical ML techniques and modern LLM-based approaches.
  • Apply a range of techniques—from classical ML to LLM-based approaches with a strong focus on reliability, performance, and maintainability.
  • Collaborate closely with product managers and designers to deliver high-quality, customer-focused features.

Optro is a leading audit, risk, ESG, and InfoSec platform, exceeding $300M ARR and experiencing continuous growth. They empower over 50% of the Fortune 500 with their award-winning technology, fostering innovation and customer satisfaction, and are recognized as one of North America's fastest-growing tech companies.