Source Job

United States Canada

  • Build and operate the real-time inference service for the risk decision engine with low latency and high availability.
  • Own model deployment infrastructure including CI/CD, shadow mode, and staged rollouts.
  • Build model observability and partner with Risk Data Science for production operation.

Python MLOps SQL FastAPI Kafka

13 jobs similar to Senior Machine Learning Operations Engineer

Jobs ranked by similarity.

Canada

  • Design and operate core AI platform components for training, deploying, and serving ML models at scale.
  • Own model serving and inference workflows end-to-end, optimizing for reliability, latency, throughput, and cost.
  • Collaborate with product, infrastructure, and security teams to build scalable platform capabilities for AI-powered features.

Mozilla Corporation is the non-profit-backed technology company behind Firefox and Pocket, with over 225 million monthly users. A wholly-owned subsidiary of the Mozilla Foundation, the company is mission-driven, employee-owned, and focused on privacy and open standards.

  • Design and build scalable ML training, deployment, and inference pipelines using CI/CD and cloud infrastructure.
  • Implement MLOps for model versioning, monitoring, and automated retraining to detect drift and performance degradation.
  • Partner with Data Scientists and Product teams to productionise models and integrate ML into customer-facing products.

We develop solutions that make an impact for companies around the globe. Our culture embraces openness, acts with respect, shows grit & guts, and combines employment with enjoyment.

US

  • Design, build, and deploy AI/ML solutions from prototype to production for client business problems.
  • Apply generative AI and LLMs, establishing MLOps best practices including CI/CD and model monitoring.
  • Serve as a trusted technical advisor, translating ambiguous problems into well-scoped solutions and presenting to stakeholders.

DevIQ builds modern cloud and data solutions for mid-market companies focused on energy reduction, healthcare, education, and smart cities. The company offers competitive benefits, a strong team culture, and opportunities to work on end-to-end solutions with multi-disciplinary teams.

$81,112–$92,025/yr
Europe

  • Empower ML Engineers with the tools, infrastructure, and frameworks they need to iterate fast autonomously.
  • Accelerate time-to-market for production-ready ML products with seamless integration and access to data and resources.
  • Own ML CI/CD in close collaboration with the DevExp team, adapting existing frameworks to ML-specific needs.

Dailymotion is a video platform designed to broaden users' horizons with a unique algorithm. They foster inclusivity and aim to build a better and safer Internet with cutting-edge solutions for video hosting and advertising. With 400 employees in France, New York, and Singapore, Dailymotion is shaking up the global video platform ecosystem.

Global 4w PTO

  • Take ownership of the ML API serving NBA recommendations and harden it for low-latency production traffic.
  • Ship your first agent tool contract end-to-end: schema design, handler implementation, and unit tests.
  • Set up the eval foundation for agents with golden transcripts, rubric-based judges, and regression suites.

Clutch is a vertical SaaS company backed by Andreessen Horowitz that helps credit unions become fintech lenders, providing affordable lending solutions to over 130 million Americans. The team is small, ambitious, and shipping fast with a culture that values pragmatism and real autonomy.

US Canada

  • Assess current pipelines and data architecture to produce a prioritized plan for change.
  • Design durable data and ML systems grounded in customer needs with documented tradeoffs.
  • Harden pipelines, upgrade data architecture, and raise standards for observability and reliability.

FutureFit AI's core mission is to help more people get to better jobs faster and cheaper, with a focus on those facing barriers to opportunity. Their team of 30-50 across the US and Canada fosters a high trust, high intensity culture with a will to win.

Global 6w PTO

  • Build, optimize, and embed machine learning models for on-device inference within the QSIDS detection engine.
  • Collaborate closely with systems engineers to integrate models efficiently into a Go-based engine.
  • Take models all the way to production and own them once they're running, monitoring performance, detecting drift, and iterating to keep them reliable.

Qohash builds the zero copy data security control layer for enterprises to adopt AI safely. The company has a strong culture centered on five core values: pursuit of excellence, resilience, mission focus, accountability, and embracing conflict.

Global 16w maternity 16w paternity

  • Design, train, evaluate, and ship ML systems for governance and security, starting with prompt injection detection and behavioral anomaly detection.
  • Build supporting infrastructure including data pipelines, feature stores, model serving, and evaluation harnesses.
  • Set technical direction for ML work, own architecture, evaluation methodology, and model lifecycle.

Docker provides developer tools for building, sharing, and running applications across Docker Desktop, Docker Hub, and Docker Scout. With over 20 million monthly users and a globally distributed remote-first team, Docker is trusted by solo founders to the world's largest companies.

US

  • Own the technical design and delivery of subsystems in a high-throughput, low-latency inference platform.
  • Develop robust API layers and SDKs that abstract complex distributed inference orchestration.
  • Build and harden a multi-tenant control plane for metering, rate limiting, and tenant isolation.

Stack develops revolutionary AI and autonomous systems to enhance safety and efficiency in trucking. The team has decades of experience deploying real-world systems and is committed to inclusion, entrepreneurship, and innovation.

India

  • Design end-to-end AI integration architectures connecting LLM APIs, vector databases, and inference systems to existing backend infrastructure.
  • Build reusable ML infrastructure components like feature pipelines, model serving layers, and evaluation frameworks that multiple portfolio companies standardize on.
  • Establish AI system integration best practices and governance patterns that become repeatable playbooks across the holding company.

Emergence is a thematic holding company backed by the Pritzker Organization focused exclusively on acquiring and scaling category-defining software businesses. They invest in focused portfolios, specialized operating groups with deep domain expertise and proven playbooks.

US Unlimited PTO

  • Own and scale AI compute and deployment platforms including Kubernetes and GitOps pipelines.
  • Build inference infrastructure and observability stacks for LLM-powered workflows.
  • Drive security, compliance, and governance at the systems level in a regulated healthcare environment.

Hims & Hers is a leading health and wellness platform focused on making healthcare accessible and personal. As a publicly traded company on the NYSE (HIMS), it offers flexible/remote work and a culture centered on innovation and employee well-being.

Brazil

  • Evolve and maintain our Kubeflow, Feast, and Spark-on-Kubernetes ML infrastructure.
  • Design tools and APIs empowering teams to transition from centralized bottlenecks to self-service excellence.
  • Collaborate with Data Science teams to apply software engineering best practices to ML workflows.

Wellhub revolutionizes workplace wellness by connecting employees to partners for fitness, mindfulness, therapy, nutrition, and sleep in one subscription. Headquartered in NYC with team members across the globe, we value wellbeing, collaboration, and different perspectives.

$125,000–$150,000/yr
US Unlimited PTO

  • Design and build systems, manage scalable ML pipelines using Vertex AI Pipelines for training, evaluation and deployment to support ranking, retrieval, and recommendation personalization use cases
  • Develop and maintain data pipelines that support feature generation, model training, and analytics workflows. Own vector generation via Milvus, storage, and retrieval workflows
  • Implement model serving solutions using KServe and build APIs using FastAPI for low latency inference Build observability and monitoring for models and pipelines.

People Inc. is America’s largest digital and print publisher. Our 40+ iconic and fast-growing brands harness the best intent-driven content, the fastest sites, and the fewest ads to help nearly 200 million people every month.