Source Job

$107,000–$145,000/yr
Canada

  • Support the full operational lifecycle of both traditional machine learning systems and emerging generative AI driven applications.
  • Enable scalable training, evaluation, deployment, and monitoring for a wide range of ML and GenAI workloads.
  • Manage model upgrades, framework versions, regression testing, maintenance tasks and maintaining performance across systems and solutions.

GCP Python Terraform CI/CD

20 jobs similar to ML Ops Engineer

Jobs ranked by similarity.

Global

  • Design, implement, and maintain high-performance ML training and inference platforms.
  • Ship tools that allow any ML engineer to deploy a model in minutes, not days.
  • Improve scalability, reliability, and cost efficiency of model training and serving systems.

Speechify's mission is to make sure that reading is never a barrier to learning. With nearly 200 people around the globe working in a 100% distributed setting, Speechify's team includes frontend and backend engineers, AI research scientists, and others.

Europe

  • Design, implement, and maintain robust, containerized, and reproducible pipelines for model training, evaluation, and deployment—across both batch and real-time settings.
  • Build and manage ML services, APIs, and model serving infrastructure using tools like MLflow, Amazon SageMaker, and Feature Store.
  • Set up and maintain monitoring, observability, and alerting systems to ensure high availability and performance (including model/data drift, feature logging, and inference latency).

AUTO1 Group Technology drives innovation in the used car market across Europe. They operate at the intersection of software engineering, data science, and DevOps, helping bring state-of-the-art ML models—such as large-scale recommendation systems and transformer-based neural networks—safely into production.

US

  • Design and implement MLOps pipelines to automate model training, deployment, monitoring, and management
  • Lead/mentor a team of MLOps Engineers, fostering an inclusive and collaborative environment that encourages innovation and continuous learning
  • Collaborate with Data Scientists and ML Engineers to ensure models are production-ready, scalable, and maintainable

Egen is a fast-growing and entrepreneurial company with a data-first mindset. They bring together the best engineering talent working with the most advanced technology platforms, including Google Cloud and Salesforce, to help clients drive action and impact through data and insights.

Global 5w PTO

  • Design, develop, and deploy robust ML systems and multi-model AI agents that solve real-world retail challenges.
  • Lead the entire lifecycle, including prototyping, deployment, monitoring, and maintenance using modern CI/CD and containerisation practices.
  • Build high-performance data pipelines (ETL/ELT) for both training and real-time inference, ensuring our systems are scalable and reliable.

EDITED is the world’s leading AI-driven retail intelligence platform. They empower the world’s most successful brands and retailers with real-time decision making power. Their environment is dynamic and supportive, encouraging team members to take initiative, innovate, and continuously grow.

US

  • Design, develop, and deploy AI/ML models and pipelines that meet mission and performance objectives.
  • Build, train, and fine-tune models using frameworks such as PyTorch, TensorFlow, scikit-learn, Hugging Face, and LangChain.
  • Write clean, efficient Python code for data ingestion, feature engineering, embeddings, and inference services.

Frontier Technology Inc. (FTI) delivers mission-focused solutions to the Department of Defense (DoD/DoW) and Intelligence Community (IC) through advanced engineering, digital transformation, and program execution expertise. They help their customers solve complex challenges and achieve mission success by integrating people, process, and technology.

US

  • Design and build scalable and efficient ML models.
  • Serve as the go-to expert for all machine learning-related inquiries.
  • Conduct in-depth research to stay at the forefront of the fields.

Game Plan Tech empowers public sector organizations with best-in-class Google solutions. They foster a collaborative environment where you can make a significant impact, drive innovation for our clients, and advance your career.

North America Unlimited PTO

  • Build and operate scalable backend services and internal APIs for the AI platform.
  • Integrate LLMs and AI tool execution into reliable, production-ready workflows.
  • Own production reliability for AI platform infrastructure through observability, alerting, and incident response.

MaintainX is the world's leading Asset and Work Intelligence platform for industrial and frontline environments. They are a modern IoT-enabled cloud-based tool for reliability, safety, and operations on physical equipment and facilities, powering operational excellence for 13,000+ businesses. MaintainX recently completed a $150 million Series D round, at a valuation of $2.5 billion.

Global

  • Deploy and manage AI agents and multi-agent workflows
  • Configure and enforce access control, permissions, and knowledge boundaries
  • Maintain governance standards and audit trails

SPACE44 builds and operates software systems for companies that need technology to work reliably in real, day-to-day operations. They work as long-term engineering partners, embedding experienced engineers into client environments and taking responsibility for execution, stability, and ongoing improvement of production systems.

Australia

  • Support and evolve the reliability of platforms used by the AI Research team.
  • Ensure production services meet expectations for availability, latency, and operational readiness.
  • Build and maintain Kubernetes-based services on GCP using infrastructure-as-code and GitOps.

Algolia is a pioneer and market leader in AI Search, empowering 17,000+ businesses to deliver blazing-fast, predictive search and browse experiences. They have raised $150 million in Series D funding, quadrupling their valuation to $2.25 billion, investing in their market-leading platform.

Australia New Zealand

  • Building world-class AI infrastructure to support a 100+ person research team.
  • Designing and scaling multi-cloud systems that support high-performance model training and inference.
  • Improving monitoring, alerting and system observability for AI workloads

Canva is redefining how the world experiences design. It has campuses in Sydney and Melbourne, and co-working spaces in other major cities, trusting employees to choose the balance that empowers them and their team to achieve their goals.

US

  • Design machine learning solutions and execute projects from proof-of-concept to production.
  • Collaborate with business representatives to gather and understand requirements.
  • Oversee all project phases including problem definition, data annotation, and training documentation.

Jobgether is a platform that connects job seekers with companies. They use AI-powered matching to ensure applications are reviewed fairly.

US Canada 3w PTO 20w maternity

  • Design, build, and maintain machine learning model productionization infrastructure.
  • Streamline model training, validation, and deployment in collaboration with the data science team.
  • Implement robust monitoring and alerting for model performance, drift, and data quality.

The Athletic delivers in-depth coverage of sports, teams, and athletes. Their newsroom of 500+ full-time staff covers hundreds of professional and college teams across North American markets and football clubs.

$84,153–$141,597/yr
Europe Unlimited PTO

  • Build scalable Edge infrastructure, designing, developing, and maintaining delivery systems to deploy models to fleets of devices.
  • Work with cross-functional teams, collaborating with Data Scientists, Embedded Engineers and Product Managers to ensure smooth integration of complex features and capabilities.
  • Drive automation and reliability, implementing infrastructure to silently test candidate models on production devices, and build telemetry pipelines to monitor drift.

Hudl builds great teams and hires the best of the best to ensure you’re working with people you can constantly learn from. They work hard to provide a culture where everyone feels supported, and their employees feel it, helping them become one of Newsweek's Top 100 Global Most Loved Workplaces.

Canada

  • Define the long-term AI vision and technical direction for the product
  • Design and review architectures and system designs
  • Lead research and experimentation into new AI technologies

Solink provides businesses with tools to transform video security into real-time operational insights. They are a rapidly growing company with over 30,000 locations across 32+ countries and have been recognized by Deloitte’s Fast 50™ and Fast 500™ and as one of Ottawa’s Best Places to Work.

$170,000–$190,000/yr
US

  • Lead the end-to-end machine learning lifecycle.
  • Own and evolve ML Ops architecture, including CI/CD for models.
  • Serve as a player-coach, contributing directly to design reviews.

AvaSure is revolutionizing healthcare with cutting-edge virtual care solutions that protect patients and empower clinical teams. We're proud of our collaborative culture where innovation thrives and every team member is valued.

$125,000–$156,300/yr
US

  • Design, build, and operate LLM-powered systems used in production.
  • Build scalable agentic AI automation solutions, selecting appropriate patterns based on business requirements.
  • Make system-level tradeoffs across model choice, latency, cost, accuracy, and operational complexity.

Natera is a global leader in cell-free DNA (cfDNA) testing, dedicated to oncology, women’s health, and organ health, aiming to make personalized genetic testing and diagnostics part of the standard of care. The Natera team consists of highly dedicated statisticians, geneticists, doctors, laboratory scientists, business professionals, software engineers and many other professionals from world-class institutions.

Europe

  • Own the Pipeline from Cloud to Edge, re-architecting machine learning model deployment to edge devices.
  • Build Shadow Mode Infrastructure to test candidate models on production devices silently.
  • Drive governance & monitoring by building tooling to monitor model drift and performance from the edge.

Hudl builds great teams and hires the best to foster continuous learning. They provide a supportive culture where employees feel valued, contributing to their recognition as a Top 100 Global Most Loved Workplace by Newsweek.

Europe 5w PTO

  • Guide the technical direction of Bondora’s ML engineering stack by selecting, evaluating, and implementing technologies to improve scalability and reliability.
  • Lead complex, high-risk, or cross-departmental projects that directly influence Data Science delivery, risk model performance, and production stability.
  • Act as the bridge between Data Science, Data Engineering, and Development to identify and solve systemic technical challenges.

Bondora's mission is to empower people to enjoy life more while alleviating the stress of managing finances. Founded in 2008, Bondora has served over 1 million customers for 16 years and is rapidly growing as a fintech company, set to acquire a banking license and expand investment and loan products across Europe.

$151,038–$234,109/yr
US

  • Work in a small, cross-functional team of 3-4 people focused on AI/ML systems.
  • Take ownership of projects from ideation to deployment with a high degree of autonomy.
  • Collaborate with product managers and stakeholders to understand customer pain points and deliver impactful solutions.

TriumphPay is building the transportation payments network for the future. Their software touches a combined $37.1B in annualized freight volume. They foster an environment that provides exceptional customer service, entrepreneurial spirit, and building successful partnerships with their clients.

Europe

  • Designing, architecting, and implementing modern, secure Azure AI platforms.
  • Enabling Data Science teams by building the "paved road" for deploying Azure ML Workspaces and GenAI services.
  • Automating model retraining, versioning, and deployment to inference endpoints using Azure DevOps.

Nordcloud is a European leader in cloud implementation, application development, managed services and training. It is a recognized cloud-native pioneer with over 1,300 employees and has delivered over 1,000 successful cloud projects for companies ranging from midsize to large corporates. Nordcloud values diversity and is dedicated to providing equal opportunities for all candidates and employees.