Source Job

US Unlimited PTO 16w maternity 4w paternity

  • Build and operate the ML lifecycle platform, including tooling for experiment tracking, model registry, and versioned pipelines.
  • Own CI/CD and deployment for ML workloads, building automated pipelines from notebook to production.
  • Make models observable and reliable in production with monitoring for latency, drift, data quality, and cost signals.

Python Kubernetes Terraform MLflow CI/CD

20 jobs similar to MLOps Engineer

Jobs ranked by similarity.

$81,112–$92,025/yr
Europe

  • Empower ML Engineers with the tools, infrastructure, and frameworks they need to iterate fast autonomously.
  • Accelerate time-to-market for production-ready ML products with seamless integration and access to data and resources.
  • Own ML CI/CD in close collaboration with the DevExp team, adapting existing frameworks to ML-specific needs.

Dailymotion is a video platform designed to broaden users' horizons with a unique algorithm. They foster inclusivity and aim to build a better and safer Internet with cutting-edge solutions for video hosting and advertising. With 400 employees in France, New York, and Singapore, Dailymotion is shaking up the global video platform ecosystem.

US Unlimited PTO

  • Own and scale AI compute and deployment platforms including Kubernetes and GitOps pipelines.
  • Build inference infrastructure and observability stacks for LLM-powered workflows.
  • Drive security, compliance, and governance at the systems level in a regulated healthcare environment.

Hims & Hers is a leading health and wellness platform focused on making healthcare accessible and personal. As a publicly traded company on the NYSE (HIMS), it offers flexible/remote work and a culture centered on innovation and employee well-being.

  • Design and build scalable ML training, deployment, and inference pipelines using CI/CD and cloud infrastructure.
  • Implement MLOps for model versioning, monitoring, and automated retraining to detect drift and performance degradation.
  • Partner with Data Scientists and Product teams to productionise models and integrate ML into customer-facing products.

We develop solutions that make an impact for companies around the globe. Our culture embraces openness, acts with respect, shows grit & guts, and combines employment with enjoyment.

Brazil

  • Evolve and maintain our Kubeflow, Feast, and Spark-on-Kubernetes ML infrastructure.
  • Design tools and APIs empowering teams to transition from centralized bottlenecks to self-service excellence.
  • Collaborate with Data Science teams to apply software engineering best practices to ML workflows.

Wellhub revolutionizes workplace wellness by connecting employees to partners for fitness, mindfulness, therapy, nutrition, and sleep in one subscription. Headquartered in NYC with team members across the globe, we value wellbeing, collaboration, and different perspectives.

United States Canada

  • Build and operate the real-time inference service for the risk decision engine with low latency and high availability.
  • Own model deployment infrastructure including CI/CD, shadow mode, and staged rollouts.
  • Build model observability and partner with Risk Data Science for production operation.

Mercury is a fintech company that provides banking services for startups via partner banks. The company is committed to creating a safe environment and values diversity, with a growing team focused on innovation.

Canada

  • Define, drive, design, and build/ship end-to-end solutions that solve real customer problems.
  • Contribute to the end-to-end AI/ML software development lifecycle, ensuring reproducible research.
  • Drive architecture, design, and delivery of advanced ML systems in the Product R&D team.

Kinaxis is a global leader in modern supply chain orchestration. Known for its AI-infused platform and transparency across end-to-end supply chains, Kinaxis helps customers make faster, better decisions. The company has over 2000 employees worldwide and is recognized with Top Employer awards.

  • Own reliability, latency, and performance for AI platform services and data infrastructure on AWS.
  • Design and maintain CI/CD pipelines, infrastructure-as-code, and observability frameworks across the stack.
  • Partner with AI and data engineers to ensure secure, cost-optimized, and scalable deployment of platform components.

HHAeXchange is the leading technology platform for home and community-based care, providing an end-to-end homecare solution for people who are aging or have disabilities. Founded in 2008, the company is passionate about transforming healthcare by connecting patients, providers, managed care organizations, and states.

SRE

Fal
$180,000–$250,000/yr
US

  • Own and operate our Kubernetes infrastructure.
  • Build and maintain CI/CD pipelines and deployment infrastructure.
  • Leverage AI to automate analysis and resolution of production issues.

Fal is the generative media ecosystem powering the next generation of AI products. They build the infrastructure, tools, and model access that teams need to move from idea to production.

US Canada

  • Assess current pipelines and data architecture to produce a prioritized plan for change.
  • Design durable data and ML systems grounded in customer needs with documented tradeoffs.
  • Harden pipelines, upgrade data architecture, and raise standards for observability and reliability.

FutureFit AI's core mission is to help more people get to better jobs faster and cheaper, with a focus on those facing barriers to opportunity. Their team of 30-50 across the US and Canada fosters a high trust, high intensity culture with a will to win.

Global

  • Develop, deploy, maintain, operate, and support an Agentic AI Developer Platform.
  • Strongly oriented towards technical implementation and operation of the platform with hands-on experience.
  • Collaborate and lend experience to less experienced team members as needed.

We build modern Machine Learning systems for demand planning and budget forecasting, offering custom AI solutions to optimize cloud-based systems. We are a remote startup with a culture that values being data nerds, open team players, ownership, and a positive mindset.

Canada

  • Design and operate core AI platform components for training, deploying, and serving ML models at scale.
  • Own model serving and inference workflows end-to-end, optimizing for reliability, latency, throughput, and cost.
  • Collaborate with product, infrastructure, and security teams to build scalable platform capabilities for AI-powered features.

Mozilla Corporation is the non-profit-backed technology company behind Firefox and Pocket, with over 225 million monthly users. A wholly-owned subsidiary of the Mozilla Foundation, the company is mission-driven, employee-owned, and focused on privacy and open standards.

US

  • Owning cloud infrastructure on Azure, data pipeline orchestration, CI/CD, and observability to ensure production-grade reliability.
  • Building and maintaining foundational infrastructure that enables fast engineering velocity without breaking things.
  • Applying SRE principles such as SLOs, capacity planning, incident response, and eliminating toil through automation.

Terzo's platform processes enterprise-scale document corpora, powers real-time AI agents, and serves the Financial Intelligence Graph to Fortune 500 customers. As a small, senior team with strong ownership and minimal bureaucracy, we foster a culture of collaboration, mentorship, and continuous improvement.

Canada

  • Build and maintain infrastructure platforms for over 200 backend services running on Kubernetes clusters with 40,000+ cores.
  • Lead and mentor other engineers, own complex infrastructure failures, and participate in a shared on-call rotation.
  • Drive cloud cost efficiency, estimate schedules, and use AI tools as a first-class collaborator in daily workflows.

Life360's mission is to keep people close to the ones they love through location sharing, safe driver reports, and crash detection. The company serves approximately 97.8 million monthly active users across more than 180 countries and has more than 500 remote-first employees.

US

  • Design, build, and maintain the core infrastructure layer supporting GenAI products.
  • Implement secure access controls and authentication mechanisms integrated by default into the AI platform components.
  • Develop and manage observability, monitoring, and logging solutions for GenAI workloads and infrastructure.

PointClickCare is a healthcare technology company. This team will serve as the product owner for GenAI capabilities, closely integrated with key horizontal partners to ensure delivery of safe, scalable and high-impact AI Products.

India

  • Assist in managing multiregion and multicloud infrastructure, ensuring resiliency, scalability, and performance.
  • Support infrastructure provisioning and deployments primarily on GCP, while gaining exposure to other cloud providers.
  • Collaborate with development teams to design and maintain CI/CD pipelines in GitLab CI and contribute to GitOps-based deployments using ArgoCD.

Learneo is a platform of builder-driven businesses, including Course Hero, CliffsNotes, LitCharts, Quillbot, Symbolab, and Scribbr, united around supercharging productivity and learning. Each team innovates independently, supported by centralized corporate operations functions, and the company values collaboration and growth.

US Unlimited PTO

  • Maintain, improve, and extend an AI platform already running in production.
  • Handle a mix of backend development, data pipelines, DevOps, and infrastructure work.
  • Translate business and product requirements into technical decisions independently.

Provectus is an AI consultancy and solutions provider. We help businesses adopt AI technologies, offering development and integration services. While the job posting doesn't mention company size information, they seem to foster a flexible, autonomous, and tech-forward culture.

US

  • Design, deploy, and manage production Kubernetes clusters with workload scheduling, resource quotas, network policies, and RBAC.
  • Build and optimize CI/CD pipelines using Infrastructure as Code and GitOps principles.
  • Implement observability solutions using Prometheus, Grafana, and OpenTelemetry for performance tuning and reliability.

VerTALENTS is a subsidiary of VerSprite Cybersecurity, specializing in technology staffing. The company connects top technical talent with industry clients through various methods, adding value to both clients and candidates for full-time and contracting opportunities.

US

  • Lead design and operation of internal developer platforms and self-service infrastructure.
  • Build and optimize CI/CD pipelines, deployment workflows, and automation across GitHub Actions, Jenkins, ArgoCD.
  • Apply SRE principles to improve developer-facing systems and software delivery performance.

Versant is a media company owning iconic brands in news, sports, and entertainment, including USA Network, Fandango, and Rotten Tomatoes. It is an independent, publicly traded company with a collaborative, inclusive culture and a remote-first work environment.

Europe

  • Design, build, and maintain scalable cloud infrastructure for an AI-powered platform.
  • Manage and optimize AWS environments, develop Infrastructure as Code using Terraform, and build CI/CD pipelines.
  • Troubleshoot production issues and implement security best practices across infrastructure and deployment pipelines.

UK

  • Build and maintain backend services, Python libraries, and model lifecycle tooling for internal ML teams.
  • Design and operate distributed systems for model serving, evaluation, and feature engineering.
  • Focus on developer experience and reliability to help teams train, deploy, and serve ML models safely.

Monzo is on a mission to make money work for everyone, offering personal and business bank accounts, savings, investments, and more through a modern digital banking platform. With around 600 engineers out of roughly 5,000 employees, we value flexibility, collaboration, and open source contributions.