Source Job

Italy

  • Lead and develop a high-performing team of MLOps engineers, fostering technical excellence and collaboration.
  • Define and execute the MLOps roadmap, aligning infrastructure initiatives with research, engineering, and product goals.
  • Design and maintain scalable ML infrastructure including automated training pipelines, CI/CD, and model serving platforms.

Python Kubernetes AWS MLflow Terraform

20 jobs similar to MLOps Lead Italy

Jobs ranked by similarity.

Global 4w PTO

  • Own the ML serving API and deploy models to production with CI/CD and infrastructure as code.
  • Build monitoring, alerting, and reliability for NBA models and LLM agents.
  • Drive architectural decisions and mentor engineers on MLOps patterns.

Clutch is a vertical SaaS company backed by Andreessen Horowitz, revolutionizing how credit unions engage with members via fintech lending software. The company is small and ambitious, with a lean data team of five that values pragmatism and fast shipping.

Brazil

  • Evolve and maintain our Kubeflow, Feast, and Spark-on-Kubernetes ML infrastructure.
  • Design tools and APIs empowering teams to transition from centralized bottlenecks to self-service excellence.
  • Collaborate with Data Science teams to apply software engineering best practices to ML workflows.

Wellhub revolutionizes workplace wellness by connecting employees to partners for fitness, mindfulness, therapy, nutrition, and sleep in one subscription. Headquartered in NYC with team members across the globe, we value wellbeing, collaboration, and different perspectives.

India

  • Collaborate with data scientists and engineers to build scalable ML pipelines, troubleshoot infrastructure issues from Linux to Kubernetes, and optimize model performance.
  • Drive high engineering standards, design on-premises MLOps solutions, and maintain tools for deployment and monitoring.
  • Refine CI/CD workflows, incorporate ML model training and evaluation into testing, and ensure seamless handover between research and production.

Learneo is a platform of builder-driven businesses, including Course Hero, CliffsNotes, LitCharts, Quillbot, Symbolab, and Scribbr, focused on supercharging productivity and learning. The company supports high-growth businesses with centralized corporate operations and has a virtual-first culture with employees across multiple countries.

India

  • Collaborate with data scientists and software engineers to build scalable data pipelines and ML deployment systems.
  • Troubleshoot issues across the ML infrastructure stack, from Linux and Docker to Kubernetes and model serving.
  • Drive high engineering standards through code reviews, testing, and CI/CD enhancements.

Quillbot helps students and professionals strengthen their writing with AI-powered tools. We serve over 56 million users globally and foster a collaborative, virtual-first culture.

India

  • Design, build, and maintain scalable machine learning infrastructure on AWS, including training and deployment pipelines.
  • Develop and deploy ML models for recommendation systems, fraud detection, credit risk, and personalization use cases.
  • Implement monitoring, logging, and alerting systems to ensure model performance, stability, and reliability in production.

Our partner is a fast-growing, innovation-driven company where machine learning and AI systems directly power large-scale fintech and commerce experiences. They foster a highly dynamic environment with strong emphasis on experimentation, rapid iteration, and measurable business impact.

US Unlimited PTO 16w maternity 4w paternity

  • Build and operate the ML lifecycle platform, including tooling for experiment tracking, model registry, and versioned pipelines.
  • Own CI/CD and deployment for ML workloads, building automated pipelines from notebook to production.
  • Make models observable and reliable in production with monitoring for latency, drift, data quality, and cost signals.

dv01 provides a data analytics platform for the structured finance market, offering transparency into investment performance and risk for lenders and Wall Street investors. With over 400 clients and coverage of over 100 million loans, dv01 is a data-first company with a diverse and innovative culture.

Germany Unlimited PTO

  • Design and maintain scalable infrastructure-as-code solutions using Terraform and Kubernetes.
  • Build and operate observability systems while leading incident response and reliability improvements.
  • Embed security and compliance practices into infrastructure and optimize system performance and cloud costs.

This partner company builds a next-generation platform enabling AI-driven services across global employment infrastructure. It is a highly distributed, async-first organization where engineers thrive in ownership and autonomy.

US Unlimited PTO

  • Design and maintain scalable ML infrastructure including data pipelines, training workflows, and model deployment systems.
  • Own end-to-end ML lifecycle operations, ensuring reliable delivery of models into production at scale.
  • Implement monitoring, telemetry, and feedback loops for ML models running across large-scale device fleets.

Our partner company develops ML systems for connected hardware products used by customers worldwide. They operate in a fast-paced, product-driven environment with a collaborative and technically ambitious culture focused on real-world ML impact.

US Unlimited PTO

  • Own and scale AI compute and deployment platforms including Kubernetes and GitOps pipelines.
  • Build inference infrastructure and observability stacks for LLM-powered workflows.
  • Drive security, compliance, and governance at the systems level in a regulated healthcare environment.

Hims & Hers is a leading health and wellness platform focused on making healthcare accessible and personal. As a publicly traded company on the NYSE (HIMS), it offers flexible/remote work and a culture centered on innovation and employee well-being.

United States Canada

  • Build and operate the real-time inference service for the risk decision engine with low latency and high availability.
  • Own model deployment infrastructure including CI/CD, shadow mode, and staged rollouts.
  • Build model observability and partner with Risk Data Science for production operation.

Mercury is a fintech company that provides banking services for startups via partner banks. The company is committed to creating a safe environment and values diversity, with a growing team focused on innovation.

Argentina 18w maternity 12w paternity

  • Own and evolve the cloud platform including compute layer, EKS fleet, serverless infrastructure, networking, and cloud operations across AWS and GCP.
  • Design and maintain infrastructure-as-code foundation and networking layer for reliability, security, and scalability.
  • Build AI-powered automation for cloud infrastructure management, including policy-as-code, drift detection, and LLM-assisted runbook generation.

Webflow builds the world's leading AI-native Digital Experience Platform, empowering teams to design, launch, and optimize for the web without barriers. As a remote-first company with over 2 million users across 190 countries, it fosters a culture of trust, transparency, and creativity.

Canada

  • Design and operate core AI platform components for training, deploying, and serving ML models at scale.
  • Own model serving and inference workflows end-to-end, optimizing for reliability, latency, throughput, and cost.
  • Collaborate with product, infrastructure, and security teams to build scalable platform capabilities for AI-powered features.

Mozilla Corporation is the non-profit-backed technology company behind Firefox and Pocket, with over 225 million monthly users. A wholly-owned subsidiary of the Mozilla Foundation, the company is mission-driven, employee-owned, and focused on privacy and open standards.

  • Design and build scalable ML training, deployment, and inference pipelines using CI/CD and cloud infrastructure.
  • Implement MLOps for model versioning, monitoring, and automated retraining to detect drift and performance degradation.
  • Partner with Data Scientists and Product teams to productionise models and integrate ML into customer-facing products.

We develop solutions that make an impact for companies around the globe. Our culture embraces openness, acts with respect, shows grit & guts, and combines employment with enjoyment.

  • Own reliability, latency, and performance for AI platform services and data infrastructure on AWS.
  • Design and maintain CI/CD pipelines, infrastructure-as-code, and observability frameworks across the stack.
  • Partner with AI and data engineers to ensure secure, cost-optimized, and scalable deployment of platform components.

HHAeXchange is the leading technology platform for home and community-based care, providing an end-to-end homecare solution for people who are aging or have disabilities. Founded in 2008, the company is passionate about transforming healthcare by connecting patients, providers, managed care organizations, and states.

Global Unlimited PTO

  • Lead and scale the Forward Deployed Engineering and Technical Support teams, defining engagement models and operating standards.
  • Own the FDE engagement lifecycle from technical discovery to deployment guidance, ensuring customer value.
  • Drive operational discipline across support tools and partner with Sales, Product, and Engineering on roadmap alignment.

Runpod is the AI Developer Cloud. More than one million developers use the platform to experiment, train, deploy, and scale AI, and we are a small, remote-first team that has processed over 20 billion inference requests and closed a $100M Series A.

Canada

  • Design and develop backend systems using Python or Kotlin for the ML Feature Platform.
  • Build and maintain a self-serve platform for feature creation, exploration, and serving for machine learning and decisioning.
  • Own end-to-end flows including data storage, availability, backfilling infrastructure, and platform improvements.

Affirm is reinventing credit to make it more honest and friendly, offering buy now, pay later solutions without hidden fees or compounding interest. The company is remote-first with a focus on reliability and performance, and it emphasizes a people-first culture.

India

  • Design and automate cloud infrastructure for scalable, secure deployments across public cloud environments.
  • Develop and maintain AI-powered services and cloud-native solutions for enterprise platforms.
  • Build monitoring, alerting, and observability solutions to proactively resolve infrastructure and application issues.

This position is listed on behalf of a partner company. They are looking for a Cloud Platform & AI Engineer based in India.

UK

  • Build and maintain backend services, Python libraries, and model lifecycle tooling for internal ML teams.
  • Design and operate distributed systems for model serving, evaluation, and feature engineering.
  • Focus on developer experience and reliability to help teams train, deploy, and serve ML models safely.

Monzo is on a mission to make money work for everyone, offering personal and business bank accounts, savings, investments, and more through a modern digital banking platform. With around 600 engineers out of roughly 5,000 employees, we value flexibility, collaboration, and open source contributions.

EMEA

  • Build and operate production-grade model serving infrastructure using vLLM, TGI, or Triton frameworks.
  • Design and implement auto-scaling, multi-model architectures, and intelligent request routing for ML inference.
  • Optimize GPU utilization, memory efficiency, and observability to ensure low-latency, cost-effective systems.

They are a distributed cloud infrastructure startup building AI-native cloud services with GPU-powered compute. The company is well-funded, fast-scaling, and operates in a remote-first environment with a focus on sustainability and decentralization.

Germany Unlimited PTO

  • Design and deliver production-grade AI/ML and GenAI solutions on cloud platforms like AWS, ensuring scalability and business value.
  • Act as a senior technical advisor for enterprise customers, translating challenges into secure, cost-efficient cloud architectures.
  • Develop reusable frameworks and best practices from delivery work to scale successful solutions across customers.

Jobgether is an AI-powered job matching platform that connects candidates with hiring companies. The company operates with a global, remote-first team focused on efficient recruitment.