Source Job

20 jobs similar to Senior Machine Learning Ops Engineer

Jobs ranked by similarity.

Canada

  • Define, drive, design, and build/ship end-to-end solutions that solve real customer problems.
  • Contribute to the end-to-end AI/ML software development lifecycle, ensuring reproducible research.
  • Drive architecture, design, and delivery of advanced ML systems in the Product R&D team.

Kinaxis is a global leader in modern supply chain orchestration. Known for its AI-infused platform and transparency across end-to-end supply chains, Kinaxis helps customers make faster, better decisions. The company has over 2000 employees worldwide and is recognized with Top Employer awards.

India

  • Collaborate with engineering and cross-functional teams to translate business problems into an ML product roadmap.
  • Contribute hands-on technical expertise as a player-coach, providing strategic direction and mentorship to the team.
  • Establish an engineering setup enabling rapid iteration, experimentation, and deployment of models, fostering operational excellence.

Twilio is shaping the future of communications by delivering innovative solutions to hundreds of thousands of businesses. They empower millions of developers worldwide to craft personalized customer experiences, emphasizing a remote-first culture with a vibrant and globally inclusive team.

US

  • Design, build, and operate core cloud infrastructure across compute, storage, databases, and networking layers.
  • Own and improve the reliability, scalability, and security of Valon’s production systems as we scale to support major enterprise deployments.
  • Evaluate, adopt, and operationalize new infrastructure technologies (e.g., Vitess, Clickhouse, Redis) to meet evolving product and scale requirements.

Valon is building the AI-native operating system for regulated finance, starting with mortgage servicing. They're a Series C company backed by a16z, transforming industries that others have written off as too complex to innovate.

$81,112–$92,025/yr
Europe

  • Empower ML Engineers with the tools, infrastructure, and frameworks they need to iterate fast autonomously.
  • Accelerate time-to-market for production-ready ML products with seamless integration and access to data and resources.
  • Own ML CI/CD in close collaboration with the DevExp team, adapting existing frameworks to ML-specific needs.

Dailymotion is a video platform designed to broaden users' horizons with a unique algorithm. They foster inclusivity and aim to build a better and safer Internet with cutting-edge solutions for video hosting and advertising. With 400 employees in France, New York, and Singapore, Dailymotion is shaking up the global video platform ecosystem.

Colombia

  • Design, implement, and deploy ML/AI models end-to-end, from concept through production, including data pipelines, training workflows, and optimization.
  • Maintain and evolve AI systems in production, monitoring for drift, debugging issues, and driving ongoing improvements to reliability and scalability.
  • Partner closely with product, engineering, and data teams to align AI work with broader product and business goals.

Robots & Pencils is an applied AI engineering firm that designs and ships AI co-workers integrating into operations and delivering results for clients. Founded in 2009, they have delivery centers in Canada, the United States, Eastern Europe, and Latin America, with teams averaging 15+ years of experience.

$150,000–$170,000/yr
US

  • Design, implement, and maintain reliable, scalable, and secure infrastructure, applications, and tooling, with a focus on our ML/AI pipelines and workloads
  • Write clean, maintainable code, and perform peer code-reviews
  • Write clear and concise documentation and engage in cross-team communication and knowledge sharing

Bright Machines is a next-generation, AI-enabled manufacturer focused on data center infrastructure assembly operations. The company utilizes AI-based robotics and software to assemble AI infrastructure hardware products for hyperscalers and leading OEMs, employing under 500 employees, with a culture rooted in innovation and expertise.

India

  • Design end-to-end AI integration architectures connecting LLM APIs, vector databases, and inference systems to existing backend infrastructure.
  • Build reusable ML infrastructure components like feature pipelines, model serving layers, and evaluation frameworks that multiple portfolio companies standardize on.
  • Establish AI system integration best practices and governance patterns that become repeatable playbooks across the holding company.

Emergence is a thematic holding company backed by the Pritzker Organization focused exclusively on acquiring and scaling category-defining software businesses. They invest in focused portfolios, specialized operating groups with deep domain expertise and proven playbooks.

Poland

  • Design and deploy GPU cluster architectures using tools like Ansible, Terraform, Kubernetes, and Slurm.
  • Lead technical deep-dives, workshops, and present solutions to stakeholders, translating complex concepts.
  • Automate provisioning and monitoring with Infrastructure as Code, and produce documentation and training materials.

Gcore is a global provider of infrastructure and software solutions for AI, cloud, network, and security, powering digital experiences worldwide. The company collaborates with leading technology partners and employs over 550 professionals building foundational technologies.

$145,000–$250,000/yr
US Unlimited PTO

  • Construct infrastructure as code, developing and enforcing best practice across configurations while preventing drift between Terraform configurations and infrastructure deployments.

SentiLink provides innovative identity and risk solutions, empowering institutions and individuals to transaction with confidence. They are building the future of identity verification in the United States replacing a clunky, ineffective, and expensive status quo with solutions that are 10x faster, smarter, and more accurate.

Europe

  • Design, build, and maintain scalable services that support the AI lifecycle.
  • Develop tools for pre/post-processing data for AI and other usage.
  • Design scalable pipelines for data collection, processing, and transformation.

Planner 5D is a global hub for home design, uniting over 100+ million users. They simplify the home renovation process with their cutting-edge software, fostering a vibrant community of enthusiastic and product-oriented professionals.

Europe

  • Define and evolve the architecture and roadmap for enterprise‑scale Data and AI platforms.
  • Design and build multi‑tenant, multi‑region, highly available AI platforms with governance.
  • Lead capacity planning and cost optimization strategies for GPU and CPU workloads.

NEORIS accelerates growth in Ibero‑America, combining global engineering with regional expertise. With over 60,000 professionals across 55+ countries, they offer technical specialization career paths and value responsibility, collaboration, creativity, and commitment.

$90,000–$150,000/yr
US

  • Design and deliver production AI and agentic systems across document intelligence, workflow automation, and copilots.
  • Define architecture decisions for LLM-based systems, including retrieval, tool use, orchestration, memory, and evaluation.
  • Own evals and observability for production AI and manage cost and latency at production volume.

Maxwell is a mortgage technology and fulfillment company with a mission to make lending simpler, faster, and more accessible. They power hundreds of lending institutions with their mortgage Point of Sale and related capabilities and are a remote-first team that takes craft seriously.

$165,000–$165,000/yr
North America Europe Middle East APAC

  • Implement and manage AI-powered tools, copilots, and workflow automations from POC to production, owning the full technical lifecycle.
  • Design, deploy, and maintain cloud infrastructure on AWS and Azure, including IAM, VPCs, security groups, multi-account strategies, and cost optimization.
  • Own reliability, observability, and security controls across all AI and cloud services, including incident response, debugging complex multi-service environments, and driving continuous improvement.

Dragos is dedicated to arming customers with best-in-class technology, threat intelligence, and services to protect their systems. They're a remote-first culture with operations in North America, Europe, the Middle East, and APAC, looking for mission-oriented teammates who embody their core values of authenticity, transparency, and trust.

US Unlimited PTO

  • Maintain, improve, and extend an AI platform already running in production.
  • Handle a mix of backend development, data pipelines, DevOps, and infrastructure work.
  • Translate business and product requirements into technical decisions independently.

Provectus is an AI consultancy and solutions provider. We help businesses adopt AI technologies, offering development and integration services. While the job posting doesn't mention company size information, they seem to foster a flexible, autonomous, and tech-forward culture.

Global

  • Own and operate GPU and accelerator clusters for AI training, inference, and experimentation, ensuring reliability and cost-efficiency.
  • Build and optimize scheduling, orchestration, and serving systems using frameworks like vLLM and Triton to improve latency, throughput, and memory efficiency.
  • Partner with ML engineers to remove workflow bottlenecks and build observability for GPU utilization, capacity, and incident response.

Kraken is a crypto exchange platform building premium financial products for traders and institutions, accelerating global crypto adoption. It is a mission-driven, fully remote company with a world-class team of crypto experts spread across more than 70 countries.

$120,000–$170,000/yr
Global Unlimited PTO

  • Own and evolve Quansight's cloud infrastructure across AWS, Azure, and GCP.
  • Build, deploy, and maintain internal dashboards and reporting for operations and project management.
  • Lead infrastructure engagements for clients from scoping and architecture through delivery, upskilling client teams.

Quansight is rooted in the Python and PyData ecosystems. They provide services ranging from open-source software development to training and consulting, believing in a culture of do-ers, learners, and collaborators.

Global Unlimited PTO

  • Own and evolve CI/CD pipelines using GitHub Actions and OIDC-based authentication for microservices and agentic workloads.
  • Automate infrastructure provisioning using Infrastructure as Code tools such as Terraform and CloudFormation.
  • Operate and scale our Kubernetes platform, including autoscaling, ingress, and multi-tenant isolation for enterprise customers.

Zingtree is a next-generation intelligent process automation platform reimagining customer experience operations for enterprise support leaders. It is a small team with high ownership, emphasizing automation, collaboration, and transparency.

$120,000–$160,000/yr
US

  • Design, develop, and deploy AI/ML models to automate and improve internal workflow.
  • Build and maintain ML pipelines within an AWS cloud environment.
  • Integrate ML capabilities into existing Java and React application workflows.

Oddball aims to improve daily lives by delivering quality software to the federal space. With a team of experienced engineering, product, and UX professionals, we value learning, growth, and making a big impact in a rapidly growing company.

$125,000–$175,000/yr
US Unlimited PTO

  • Act as a trusted advisor to customers, building relationships with technical and business stakeholders.
  • Advise on GenAI and ML best practices, giving product demos to technical and business stakeholders.
  • Partner with product and engineering teams to drive the product roadmap and spearhead new opportunities within existing accounts.

Arize AI is transforming the world by helping teams monitor, troubleshoot, and optimize their AI systems with its AI & Agent Engineering observability and evaluation platform. They are a Series C company backed by top-tier investors, with over $135M in funding and a rapidly growing customer base.

Global

  • Design, build, and implement an end-to-end AI-powered tender response platform.
  • Develop and maintain backend services and AI workflows using AWS-native technologies.
  • Build and manage infrastructure using Terraform and cloud-native best practices.

Smart Working believes that your job should not only look right on paper but also feel right every day. They break down geographic barriers and connect skilled professionals with outstanding global teams and products for full-time, long-term roles.