Source Job

$155,584–$320,320/yr
US

  • Scale the decisionmaking process for tools for the tvScientific AI team, from our workflows to our training infrastructure to our Kubernetes deployments
  • Improve the developer experience for the data science team
  • Upgrade our observability tooling

Linux Terraform Python Scala

20 jobs similar to Sr. ML Ops Engineer

Jobs ranked by similarity.

$107,000–$145,000/yr
Canada

  • Support the full operational lifecycle of both traditional machine learning systems and emerging generative AI driven applications.
  • Enable scalable training, evaluation, deployment, and monitoring for a wide range of ML and GenAI workloads.
  • Manage model upgrades, framework versions, regression testing, maintenance tasks and maintaining performance across systems and solutions.

Achievers' employee recognition and rewards platform empowers organizations to build cultures where people feel seen and valued, everyday. They're a team of passionate, thoughtful builders with more than 4.3 million users across 190 countries, who care deeply about their product, their customers, and each other.

$117,180–$154,588/yr
Canada

  • You will work to build, maintain and improve our Torc ML frameworks.
  • You have built ML solutions that have reached production.
  • You want to build, maintain, grow, and improve our ML platform.

Torc has been a leader in autonomous driving since 2007. Now a part of the Daimler family, they are focused solely on developing software for automated trucks to transform how the world moves freight.

$84,153–$141,597/yr
Europe Unlimited PTO

  • Build scalable Edge infrastructure, designing, developing, and maintaining delivery systems to deploy models to fleets of devices.
  • Work with cross-functional teams, collaborating with Data Scientists, Embedded Engineers and Product Managers to ensure smooth integration of complex features and capabilities.
  • Drive automation and reliability, implementing infrastructure to silently test candidate models on production devices, and build telemetry pipelines to monitor drift.

Hudl builds great teams and hires the best of the best to ensure you’re working with people you can constantly learn from. They work hard to provide a culture where everyone feels supported, and their employees feel it, helping them become one of Newsweek's Top 100 Global Most Loved Workplaces.

Europe 5w PTO

  • Design, implement, and manage AI Platform architecture.
  • Control AI-related costs, including models, GPUs, and other resources.
  • Collaborate with ML teams to operationalize AI models and integrate them into systems.

Docplanner empowers patients by giving them access to leave and read reviews about their visit and provides doctors with the technology to manage bookings easily and save time. They are leaders in 13 countries with 2,500+ employees globally and maintain a startup-mindset.

UK

  • Act as the overall technical authority for the programme, owning architectural decisions, execution patterns, and technical quality across all workstreams.
  • Define and enforce standard migration patterns for moving ML workloads from Databricks into AWS SageMaker, while managing exceptions for complex or legacy cases.
  • Lead and contribute across areas such as AWS SageMaker-based ML execution, Databricks to SageMaker migration, and Python-based ML workloads.

CreateFuture is a digital consultancy that builds digital products and services. They have over 500 people and a safe, supportive, and friendly culture.

EMEA

  • Design and implement tooling that enables researchers to quickly deploy and evaluate new models in production
  • Design, build, and maintain high-performance, cost-efficient inference pipelines, making architectural decisions about scaling, reliability, and cost trade-offs
  • Proactively identify and resolve infrastructure bottlenecks, proposing and scoping improvements to iteration speed and production reliability

AssemblyAI builds best-in-class Speech AI models that power the next generation of voice applications. They are a remote team building one of the next great AI companies where teammates define and build their company culture.

$170,000–$240,000/yr
US Unlimited PTO

  • Own SentiLink’s real-time ML model monitoring domain.
  • Own our ML experimentation, model tracking, and versioning infrastructure.
  • Drive improvements to the model development process.

SentiLink provides identity and risk solutions for secure transactions. They are backed by investors like Craft Ventures and Andreessen Horowitz, recognized by Forbes Fintech 50, and have offices across the U.S. and India.

$155,584–$320,320/yr
US

  • Write production Python that powers real-time bidding, model training, and campaign optimization
  • Train, deploy, and monitor ML models that decide which ads to show, when, and at what price: millions of bid decisions per second
  • Build and improve our incrementality measurement systems: helping advertisers understand the true causal lift of their CTV spend

tvScientific is the first and only CTV advertising platform purpose-built for performance marketers. We leverage massive data and cutting-edge science to automate and optimize TV advertising to drive business outcomes.

US Canada 3w PTO 20w maternity

  • Design, build, and maintain machine learning model productionization infrastructure.
  • Streamline model training, validation, and deployment in collaboration with the data science team.
  • Implement robust monitoring and alerting for model performance, drift, and data quality.

The Athletic delivers in-depth coverage of sports, teams, and athletes. Their newsroom of 500+ full-time staff covers hundreds of professional and college teams across North American markets and football clubs.

Global Unlimited PTO

  • Improve the culture of shipping and accountability.
  • Drive adoption of AI tools for development and integration.
  • Rapidly grow the software engineering team.

Recast is looking for a highly motivated and entrepreneurial Director of Engineering to help them accelerate their engineering and product development practice. They are a fully remote team with members in 6+ countries, committed to building a diverse team.

US

  • Work with and debug our current stack: GitHub, Jenkins, GitOps, k8s, Flux CD, Databricks and help design our future stack (i.e. Argo CD) with self-service.
  • Cut deep into systems to drive directly to a resolution and develop the source code to make it possible.
  • Face a variety of challenges that will allow you to constantly expand your repertoire by always learning new complex systems and skills.

Integral Ad Science (IAS) is a global technology and data company that builds verification, optimization, and analytics solutions for the advertising industry. IAS is committed to diversity and inclusiveness and encourages women, people of color, members of the LGBTQIA community, people with disabilities and veterans to apply.

  • You own uptime, observability, incident response, and root cause analysis.
  • Own the AWS architecture.
  • Make ML pipelines reliable.

Ferra is building AI infrastructure for structural steel estimation. They process large-scale construction drawing PDFs, run computer vision + LLM pipelines, and generate structured steel graphs, takeoffs, and export-ready models. The team is small and technical, which means high ownership, fast decisions, and work has a direct impact on the core product.

US Unlimited PTO

  • Influence the technical direction for infrastructure and platform capabilities that support our rapidly growing AI product suite.
  • Architect and evolve our cloud infrastructure (primarily on AWS) to support current and future products.
  • Mentor and level up engineers across Platform and product teams; review design docs, guide architecture decisions, and model high standards.

Rad AI is on a mission to transform healthcare with artificial intelligence. Our AI-driven solutions are revolutionizing radiology—saving time, reducing burnout, and improving patient care. Rad AI has secured over $140M in funding and our valuation is at $528M.

North America 4w PTO

  • Partner with stakeholders to tackle technical problems at scale, building framework agnostic services.
  • Establish roadmap and architecture for Wealthsimple’s Machine Learning platform.
  • Build highly performant scalable systems, contributing to our ML platform on Kubernetes, Bedrock and Sagemaker.

Wealthsimple aims to provide financial freedom by making financial services transparent and low-cost. As the largest fintech company in Canada, with over 1,500 employees, they manage over $100 billion in assets and foster a collaborative and quality-focused culture.

US

  • Analyze requirements and propose innovative AI-native solutions to technical problems
  • Write clean scalable code
  • Test and deploy features & services

WorkHero is building the AI-powered back office for the skilled trades, starting with the $50B+ HVAC industry. They have exciting traction and just closed a $5M seed round to expand their engineering and product organization, as well as add additional services.

$120,000–$205,000/yr
US

  • Build ETL pipelines.
  • Integrate with APIs hosted on a GenAI platform.
  • Focus on DevOps/SRE.

Kunai builds full-stack technology solutions for banks, credit and payment networks, infrastructure providers, and their customers. Their exceptional team thrives in a culture of collaboration, creativity, and continuous learning.

Australia New Zealand

  • Building world-class AI infrastructure to support a 100+ person research team.
  • Designing and scaling multi-cloud systems that support high-performance model training and inference.
  • Improving monitoring, alerting and system observability for AI workloads

Canva is redefining how the world experiences design. It has campuses in Sydney and Melbourne, and co-working spaces in other major cities, trusting employees to choose the balance that empowers them and their team to achieve their goals.

US Unlimited PTO

  • Engaging directly with current and prospective clients to understand business needs, translate them into technical requirements, and communicate findings in a clear, actionable way
  • Partnering with internal and client stakeholders to shape solutions, develop proposals, and contribute to go-to-market initiatives
  • Design, develop and deploy efficient data pipeline for both structured and unstructured data

Resultant consists of a team of engineers, mathematicians, data analysts, project managers, and business consultants. They partner with clients in the public and private sectors to help them overcome complex challenges, empowering clients to drive meaningful change.

US

  • Define and evolve the technical vision for AI and agentic systems across products.
  • Design orchestration, data, and serving patterns that handle global scale with reliability.
  • Collaborate with AI Research to turn prototypes into extensible, governed production frameworks.

KnowBe4 is a cybersecurity company that puts security first, empowering over 70,000 organizations worldwide to strengthen their security culture. They value radical transparency, extreme ownership, and continuous professional development in a welcoming workplace that encourages all employees to be themselves.

Europe

  • Designing, architecting, and implementing modern, secure Azure AI platforms.
  • Enabling Data Science teams by building the "paved road" for deploying Azure ML Workspaces and GenAI services.
  • Automating model retraining, versioning, and deployment to inference endpoints using Azure DevOps.

Nordcloud is a European leader in cloud implementation, application development, managed services and training. It is a recognized cloud-native pioneer with over 1,300 employees and has delivered over 1,000 successful cloud projects for companies ranging from midsize to large corporates. Nordcloud values diversity and is dedicated to providing equal opportunities for all candidates and employees.