Source Job

Brazil

  • Evolve and maintain our Kubeflow, Feast, and Spark-on-Kubernetes ML infrastructure.
  • Design tools and APIs empowering teams to transition from centralized bottlenecks to self-service excellence.
  • Collaborate with Data Science teams to apply software engineering best practices to ML workflows.

Kubernetes Python Terraform AWS

20 jobs similar to Senior MLOps Engineer

Jobs ranked by similarity.

US Unlimited PTO

  • Own and scale AI compute and deployment platforms including Kubernetes and GitOps pipelines.
  • Build inference infrastructure and observability stacks for LLM-powered workflows.
  • Drive security, compliance, and governance at the systems level in a regulated healthcare environment.

Hims & Hers is a leading health and wellness platform focused on making healthcare accessible and personal. As a publicly traded company on the NYSE (HIMS), it offers flexible/remote work and a culture centered on innovation and employee well-being.

Argentina 18w maternity 12w paternity

  • Own and evolve the cloud platform including compute layer, EKS fleet, serverless infrastructure, networking, and cloud operations across AWS and GCP.
  • Design and maintain infrastructure-as-code foundation and networking layer for reliability, security, and scalability.
  • Build AI-powered automation for cloud infrastructure management, including policy-as-code, drift detection, and LLM-assisted runbook generation.

Webflow builds the world's leading AI-native Digital Experience Platform, empowering teams to design, launch, and optimize for the web without barriers. As a remote-first company with over 2 million users across 190 countries, it fosters a culture of trust, transparency, and creativity.

$81,112–$92,025/yr
Europe

  • Empower ML Engineers with the tools, infrastructure, and frameworks they need to iterate fast autonomously.
  • Accelerate time-to-market for production-ready ML products with seamless integration and access to data and resources.
  • Own ML CI/CD in close collaboration with the DevExp team, adapting existing frameworks to ML-specific needs.

Dailymotion is a video platform designed to broaden users' horizons with a unique algorithm. They foster inclusivity and aim to build a better and safer Internet with cutting-edge solutions for video hosting and advertising. With 400 employees in France, New York, and Singapore, Dailymotion is shaking up the global video platform ecosystem.

Global

  • Contribute to the development of the Everywhere Inference platform, a Kubernetes-based solution.
  • Design and implement APIs and developer tools to simplify deployment, management, and monitoring of AI applications.
  • Optimize serverless container workflows for AI workloads, ensuring performance, scalability, and seamless autoscaling.

Gcore provides infrastructure and software solutions for AI, cloud, network, and security. They have 550+ professionals globally and power everything from real-time communication and streaming to enterprise AI and secure web applications.

Global

  • Develop, deploy, maintain, operate, and support an Agentic AI Developer Platform.
  • Strongly oriented towards technical implementation and operation of the platform with hands-on experience.
  • Collaborate and lend experience to less experienced team members as needed.

We build modern Machine Learning systems for demand planning and budget forecasting, offering custom AI solutions to optimize cloud-based systems. We are a remote startup with a culture that values being data nerds, open team players, ownership, and a positive mindset.

Europe

  • Define and evolve the architecture and roadmap for enterprise‑scale Data and AI platforms.
  • Design and build multi‑tenant, multi‑region, highly available AI platforms with governance.
  • Lead capacity planning and cost optimization strategies for GPU and CPU workloads.

NEORIS accelerates growth in Ibero‑America, combining global engineering with regional expertise. With over 60,000 professionals across 55+ countries, they offer technical specialization career paths and value responsibility, collaboration, creativity, and commitment.

Canada

  • Define, drive, design, and build/ship end-to-end solutions that solve real customer problems.
  • Contribute to the end-to-end AI/ML software development lifecycle, ensuring reproducible research.
  • Drive architecture, design, and delivery of advanced ML systems in the Product R&D team.

Kinaxis is a global leader in modern supply chain orchestration. Known for its AI-infused platform and transparency across end-to-end supply chains, Kinaxis helps customers make faster, better decisions. The company has over 2000 employees worldwide and is recognized with Top Employer awards.

SRE

Fal
$180,000–$250,000/yr
US

  • Own and operate our Kubernetes infrastructure.
  • Build and maintain CI/CD pipelines and deployment infrastructure.
  • Leverage AI to automate analysis and resolution of production issues.

Fal is the generative media ecosystem powering the next generation of AI products. They build the infrastructure, tools, and model access that teams need to move from idea to production.

US Unlimited PTO

  • Maintain, improve, and extend an AI platform already running in production.
  • Handle a mix of backend development, data pipelines, DevOps, and infrastructure work.
  • Translate business and product requirements into technical decisions independently.

Provectus is an AI consultancy and solutions provider. We help businesses adopt AI technologies, offering development and integration services. While the job posting doesn't mention company size information, they seem to foster a flexible, autonomous, and tech-forward culture.

Canada

  • Design and operate core AI platform components for training, deploying, and serving ML models at scale.
  • Own model serving and inference workflows end-to-end, optimizing for reliability, latency, throughput, and cost.
  • Collaborate with product, infrastructure, and security teams to build scalable platform capabilities for AI-powered features.

Mozilla Corporation is the non-profit-backed technology company behind Firefox and Pocket, with over 225 million monthly users. A wholly-owned subsidiary of the Mozilla Foundation, the company is mission-driven, employee-owned, and focused on privacy and open standards.

$160,000–$180,000/yr
US Unlimited PTO

  • Identify systemic engineering challenges across our platforms and drive their resolution.
  • Write code, review PRs, debug production issues, and optimize system performance.
  • Partner with engineering teams as a technical point of contact on complex projects.

Zeta Global is an AI-Powered Marketing Cloud that leverages advanced artificial intelligence (AI) and trillions of consumer signals to help marketers acquire, grow, and retain customers more efficiently. They were founded in 2007 and are headquartered in New York City with offices around the world.

Global Unlimited PTO

  • Partner with strategic enterprise customers as a trusted technical advisor, guiding them through their Camunda adoption journey with customized technical adoption plans.
  • Deliver hands-on technical guidance on platform architecture, AI-enabled automation, LLM orchestration, agent frameworks, and scalable cloud infrastructure.
  • Proactively remove risks through health checks, escalation management, and cross-functional collaboration with Camunda teams to drive measurable customer outcomes.

Camunda is the enterprise platform for agentic orchestration, enabling organizations to coordinate AI agents, people, and systems across complex business processes. With over 700 organizations worldwide, including 9 of top 10 US banks, Camunda is a fully remote, global, AI-first organization that has been named to GP Bullhound's Top 100 Next Unicorn list and is Great Place to Work certified.

$145,000–$250,000/yr
US Unlimited PTO

  • Construct infrastructure as code, developing and enforcing best practice across configurations while preventing drift between Terraform configurations and infrastructure deployments.

SentiLink provides innovative identity and risk solutions, empowering institutions and individuals to transaction with confidence. They are building the future of identity verification in the United States replacing a clunky, ineffective, and expensive status quo with solutions that are 10x faster, smarter, and more accurate.

US Unlimited PTO

  • Build AI-powered tools into engineers' day-to-day workflows (e.g., Claude, VS Code, GitLab, Datadog, internal documentation, Slack chatops).
  • Implement and evolve inference and tool-calling pathways using Claude models on Amazon Bedrock, LiteLLM, and MCP/tool gateways within Omada's secure networks.
  • Partner with teams across Engineering to define AI-augmented SDLC patterns across planning, coding, testing, and operations.

Omada Health is a virtual-first healthcare and technology company that combines human-led care teams, connected devices, and AI-enabled technology to deliver personalized care at scale, focusing on chronic conditions like obesity, diabetes, and hypertension. They have served more than two million members since launch across 2,000+ employers, health plans, pharmacy benefit managers, and health systems, and are certified as a Great Place to Work.

Global

  • Deploy and maintain infrastructure using Terraform on AWS.
  • Operate and govern production-grade platforms running on Kubernetes / EKS.
  • Build and maintain CI/CD pipelines using GitHub Actions.

Muttdata is a dynamic startup committed to crafting innovative systems using cutting-edge Big Data and Machine Learning technologies. They are looking for a hands-on DevOps to join a strategic initiative focused on deploying and operating Data & AI platforms.

UK

  • Manage and optimise Kubernetes clusters in GKE through Terraform.
  • Design and implement automation strategies that empower developers to self-serve.
  • Serve as the technical point-of-contact for GCP and Kubernetes-related queries.

Prolific builds the human data infrastructure that reshapes AI development by enabling collection of high-quality, ethically sourced human behavioral data. They are a mission-driven company with a competitive salary and benefits, offering remote working within a culture focused on impact and innovation.

India

  • Assist in managing multiregion and multicloud infrastructure, ensuring resiliency, scalability, and performance.
  • Support infrastructure provisioning and deployments primarily on GCP, while gaining exposure to other cloud providers.
  • Collaborate with development teams to design and maintain CI/CD pipelines in GitLab CI and contribute to GitOps-based deployments using ArgoCD.

Learneo is a platform of builder-driven businesses, including Course Hero, CliffsNotes, LitCharts, Quillbot, Symbolab, and Scribbr, united around supercharging productivity and learning. Each team innovates independently, supported by centralized corporate operations functions, and the company values collaboration and growth.

US

  • Design, build, and deploy AI/ML solutions from prototype to production for client business problems.
  • Apply generative AI and LLMs, establishing MLOps best practices including CI/CD and model monitoring.
  • Serve as a trusted technical advisor, translating ambiguous problems into well-scoped solutions and presenting to stakeholders.

DevIQ builds modern cloud and data solutions for mid-market companies focused on energy reduction, healthcare, education, and smart cities. The company offers competitive benefits, a strong team culture, and opportunities to work on end-to-end solutions with multi-disciplinary teams.

$188,550–$212,150/yr
Global Unlimited PTO

  • Own the technical direction of Remote's SRE/Platform domain.
  • Define and drive the reliability strategy across the platform.
  • Identify and lead AI enablement initiatives across the engineering organisation.

Remote is solving modern organizations’ biggest challenge – navigating global employment compliantly with ease. With our core values at heart and a future-focused work culture, our team works tirelessly on ambitious problems, asynchronously, around the world.

Latin America

  • Build and operate the self-service infrastructure platform for developers and AI agents.
  • Own core platform layers including CI/CD, GitOps, IaC module catalog, and golden-path scaffolding.
  • Build internal tooling, observability, and metrics to make pipelines observable and improvable.

Luxury Presence is building the AI growth platform for real estate. Backed by top investors like Bessemer Venture Partners, we're a Series C company with over $100M in ARR and more than 90,000 real estate professionals using our platform.