Source Job

Global 4w PTO

  • Own the ML serving API and deploy models to production with CI/CD and infrastructure as code.
  • Build monitoring, alerting, and reliability for NBA models and LLM agents.
  • Drive architectural decisions and mentor engineers on MLOps patterns.

Python AWS Terraform CI/CD MLOps

20 jobs similar to Senior MLOps Engineer

Jobs ranked by similarity.

US Unlimited PTO 16w maternity 4w paternity

  • Build and operate the ML lifecycle platform, including tooling for experiment tracking, model registry, and versioned pipelines.
  • Own CI/CD and deployment for ML workloads, building automated pipelines from notebook to production.
  • Make models observable and reliable in production with monitoring for latency, drift, data quality, and cost signals.

dv01 provides a data analytics platform for the structured finance market, offering transparency into investment performance and risk for lenders and Wall Street investors. With over 400 clients and coverage of over 100 million loans, dv01 is a data-first company with a diverse and innovative culture.

Global 4w PTO

  • Take ownership of the ML API serving NBA recommendations and harden it for low-latency production traffic.
  • Ship your first agent tool contract end-to-end: schema design, handler implementation, and unit tests.
  • Set up the eval foundation for agents with golden transcripts, rubric-based judges, and regression suites.

Clutch is a vertical SaaS company backed by Andreessen Horowitz that helps credit unions become fintech lenders, providing affordable lending solutions to over 130 million Americans. The team is small, ambitious, and shipping fast with a culture that values pragmatism and real autonomy.

  • Design and build scalable ML training, deployment, and inference pipelines using CI/CD and cloud infrastructure.
  • Implement MLOps for model versioning, monitoring, and automated retraining to detect drift and performance degradation.
  • Partner with Data Scientists and Product teams to productionise models and integrate ML into customer-facing products.

We develop solutions that make an impact for companies around the globe. Our culture embraces openness, acts with respect, shows grit & guts, and combines employment with enjoyment.

US

  • Design, build, and deploy AI/ML solutions from prototype to production for client business problems.
  • Apply generative AI and LLMs, establishing MLOps best practices including CI/CD and model monitoring.
  • Serve as a trusted technical advisor, translating ambiguous problems into well-scoped solutions and presenting to stakeholders.

DevIQ builds modern cloud and data solutions for mid-market companies focused on energy reduction, healthcare, education, and smart cities. The company offers competitive benefits, a strong team culture, and opportunities to work on end-to-end solutions with multi-disciplinary teams.

United States Canada

  • Build and operate the real-time inference service for the risk decision engine with low latency and high availability.
  • Own model deployment infrastructure including CI/CD, shadow mode, and staged rollouts.
  • Build model observability and partner with Risk Data Science for production operation.

Mercury is a fintech company that provides banking services for startups via partner banks. The company is committed to creating a safe environment and values diversity, with a growing team focused on innovation.

Brazil

  • Evolve and maintain our Kubeflow, Feast, and Spark-on-Kubernetes ML infrastructure.
  • Design tools and APIs empowering teams to transition from centralized bottlenecks to self-service excellence.
  • Collaborate with Data Science teams to apply software engineering best practices to ML workflows.

Wellhub revolutionizes workplace wellness by connecting employees to partners for fitness, mindfulness, therapy, nutrition, and sleep in one subscription. Headquartered in NYC with team members across the globe, we value wellbeing, collaboration, and different perspectives.

Canada

  • Design and operate core AI platform components for training, deploying, and serving ML models at scale.
  • Own model serving and inference workflows end-to-end, optimizing for reliability, latency, throughput, and cost.
  • Collaborate with product, infrastructure, and security teams to build scalable platform capabilities for AI-powered features.

Mozilla Corporation is the non-profit-backed technology company behind Firefox and Pocket, with over 225 million monthly users. A wholly-owned subsidiary of the Mozilla Foundation, the company is mission-driven, employee-owned, and focused on privacy and open standards.

US

  • Design and develop production-grade AI/ML services and web applications from proof-of-concept to scalable platforms.
  • Implement MLOps best practices, CI/CD pipelines, and cloud deployment for AI/ML workloads.
  • Collaborate with cross-functional teams to integrate AI capabilities into engineering workflows.

Cayuse Civil Services, LLC provides enterprise AI and engineering solutions for government and infrastructure clients. The company values innovation, excellence, collaboration, adaptability, and integrity, fostering a culture of teamwork and quality.

US Unlimited PTO

  • Own and scale AI compute and deployment platforms including Kubernetes and GitOps pipelines.
  • Build inference infrastructure and observability stacks for LLM-powered workflows.
  • Drive security, compliance, and governance at the systems level in a regulated healthcare environment.

Hims & Hers is a leading health and wellness platform focused on making healthcare accessible and personal. As a publicly traded company on the NYSE (HIMS), it offers flexible/remote work and a culture centered on innovation and employee well-being.

  • Own reliability, latency, and performance for AI platform services and data infrastructure on AWS.
  • Design and maintain CI/CD pipelines, infrastructure-as-code, and observability frameworks across the stack.
  • Partner with AI and data engineers to ensure secure, cost-optimized, and scalable deployment of platform components.

HHAeXchange is the leading technology platform for home and community-based care, providing an end-to-end homecare solution for people who are aging or have disabilities. Founded in 2008, the company is passionate about transforming healthcare by connecting patients, providers, managed care organizations, and states.

US Unlimited PTO

  • Drive end-to-end ML development for customer-facing SaaS products, from pipelines to production deployment and monitoring.
  • Design evaluation strategies and A/B tests to prove ML features improve customer outcomes and business impact.
  • Influence product roadmap by communicating ML capabilities and trade-offs to cross-functional teams.

WorkWave provides field service and logistics software solutions that help businesses manage their operations and serve their customers. They are a global company with a remote-first culture, recognized as a Best Place to Work and named among the top software companies worldwide.

US Unlimited PTO

  • Own the US-only production environment end-to-end, including infrastructure deployment, maintenance, scaling, and reliability.
  • Lead and grow the US-based DevOps team, design scalable AWS infrastructure, and build CI/CD pipelines for safe, fast shipping.
  • Partner with engineering on application error investigations, improve monitoring and alerting, and coordinate with the Tel Aviv team on shared platform standards.

Zafran de-risks 90% of critical vulnerabilities overnight across hybrid environments using existing security tools. Backed by Sequoia Capital and Cyberstarts, it is one of the fastest-growing companies in cybersecurity, scaling to meet demand from advanced organizations.

US Canada

  • Assess current pipelines and data architecture to produce a prioritized plan for change.
  • Design durable data and ML systems grounded in customer needs with documented tradeoffs.
  • Harden pipelines, upgrade data architecture, and raise standards for observability and reliability.

FutureFit AI's core mission is to help more people get to better jobs faster and cheaper, with a focus on those facing barriers to opportunity. Their team of 30-50 across the US and Canada fosters a high trust, high intensity culture with a will to win.

United States 4w PTO

  • Own and improve infrastructure, deployment systems, and operational foundation for reliability and security.
  • Build safer deployment paths, strengthen observability, and lead infrastructure migrations.
  • Partner with engineers on scaling, error handling, and backend changes to support AI-enabled workflows.

Clever is a venture-backed real estate technology company that builds a leading online education platform and has earned a 4.9 TrustPilot rating. The company has helped consumers save over $210 million in real estate fees and fosters a culture of innovation and transparency.

Canada

  • Build and maintain infrastructure platforms for over 200 backend services running on Kubernetes clusters with 40,000+ cores.
  • Lead and mentor other engineers, own complex infrastructure failures, and participate in a shared on-call rotation.
  • Drive cloud cost efficiency, estimate schedules, and use AI tools as a first-class collaborator in daily workflows.

Life360's mission is to keep people close to the ones they love through location sharing, safe driver reports, and crash detection. The company serves approximately 97.8 million monthly active users across more than 180 countries and has more than 500 remote-first employees.

US

  • Lead design and operation of internal developer platforms and self-service infrastructure.
  • Build and optimize CI/CD pipelines, deployment workflows, and automation across GitHub Actions, Jenkins, ArgoCD.
  • Apply SRE principles to improve developer-facing systems and software delivery performance.

Versant is a media company owning iconic brands in news, sports, and entertainment, including USA Network, Fandango, and Rotten Tomatoes. It is an independent, publicly traded company with a collaborative, inclusive culture and a remote-first work environment.

United States

  • Design and build core platform infrastructure for large-scale cloud-native data and analytics systems.
  • Own and improve CI/CD pipelines, testing frameworks, and deployment in a high-scale PaaS environment.
  • Contribute to reliability engineering, observability, and operational excellence across distributed systems.

Jobgether uses an AI-powered matching process to connect candidates with roles. The company is a growing platform focused on efficient job matching and data privacy compliance.

Europe

  • Design, build, and maintain scalable cloud infrastructure for an AI-powered platform.
  • Manage and optimize AWS environments, develop Infrastructure as Code using Terraform, and build CI/CD pipelines.
  • Troubleshoot production issues and implement security best practices across infrastructure and deployment pipelines.

Global 6w PTO

  • Build, optimize, and embed machine learning models for on-device inference within the QSIDS detection engine.
  • Collaborate closely with systems engineers to integrate models efficiently into a Go-based engine.
  • Take models all the way to production and own them once they're running, monitoring performance, detecting drift, and iterating to keep them reliable.

Qohash builds the zero copy data security control layer for enterprises to adopt AI safely. The company has a strong culture centered on five core values: pursuit of excellence, resilience, mission focus, accountability, and embracing conflict.

Ireland

  • Design and develop machine learning solutions ensuring accuracy, performance, security, and scalability
  • Implement and maintain end-to-end AI/ML pipelines from data ingestion to deployment
  • Collaborate across planning, design, and code review to raise overall code quality

We shape the future of communications from remote-first environments. We deliver innovative solutions to hundreds of thousands of businesses and empower millions of developers worldwide, with a strong culture of connection and inclusion.