Source Job

US Unlimited PTO

  • Build Enterprise-Scale Infrastructure leveraging infrastructure-as-code to manage complex cloud environments.
  • Sustain Platform Health and Performance owning critical systems in production, including reliability and security.
  • Enable Teams and Customers to Move Faster creating abstractions and tooling that deploy, run, and scale AI/ML workloads.

Terraform Kubernetes Istio Go TypeScript

20 jobs similar to Staff Software Engineer, ML Platform

Jobs ranked by similarity.

  • Lead the design, implementation, and continuous improvement of our cloud-native platform infrastructure.
  • Create and maintain tooling and automation that improves efficiency and developer experience.
  • Drive platform optimization initiatives focused on performance, cost efficiency, and reliability.

Intelerad's medical imaging solutions streamline the flow of information, simplifying complex processes, maximizing efficiencies, and shining a light on the unknown.

  • Helping improve the infrastructure and data platform using a lean approach.
  • Creating a data platform and infrastructure optimized for developments using Machine Learning and massive data processing.
  • Improving the development experience and spreading the DevOps culture in the company.

Clarity AI is a global tech company founded in 2017 with a mission to bring societal impact to markets. They leverage AI and machine learning to provide data, methodologies, and tools to investors, governments, companies, and consumers for informed decisions; they are a team of over 300 individuals with offices in New York, Madrid, London, Paris, and Abu Dhabi, backed by investors like BlackRock and SoftBank. .

ANZ

  • Building world-class AI infrastructure to support a 100+ person research team.
  • Designing and scaling multi-cloud systems that support high-performance model training and inference.
  • Improving monitoring, alerting and system observability for AI workloads.

Canva is redefining how the world experiences design. They have campuses in Sydney and Melbourne, co-working spaces in Brisbane, Perth, Adelaide and Auckland, and trust their employees to choose the balance that empowers them and their team to achieve their goals.

India

  • Design and manage AWS infrastructure for AI services.
  • Implement Infrastructure as Code using Terraform.
  • Collaborate with cross-functional teams to enhance performance.

Jobgether uses an AI-powered matching process to ensure applications are reviewed quickly, objectively, and fairly against the role's core requirements. Their system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company.

$75,000–$100,000/yr

Help build and operate core cloud-native systems including VKE, VLB, VCR, Vultr Inference, NAT Gateways, and our internal APIs. The ideal candidate has a strong understanding of Kubernetes components, container runtime internals, and modern IaC/automation practices. This role will have a direct impact on Vultr’s global cloud infrastructure footprint.

Vultr is on a mission to make high-performance cloud infrastructure easy to use, affordable, and locally accessible for enterprises and AI innovators around the world.

Canada 5w PTO

  • Design and evolve infrastructure systems to ensure scalability, reliability, and cost efficiency.
  • Lead and mentor a distributed infrastructure team, fostering a collaborative and inclusive culture.
  • Oversee all cloud environments supporting MZLA’s products and business systems.

MZLA Technologies Corporation (MZLA) is a wholly owned, for-profit subsidiary of the Mozilla Foundation and home to Thunderbird. They are a small but growing team of 50+ people distributed across seven countries building an open-source email and productivity platform.

US

  • Architect and deploy secure, scalable infrastructure using Terraform, CloudFormation, or similar tools.
  • Ensure the platform meets strict SLA requirements for enterprise clients, minimizing downtime.
  • Implement comprehensive monitoring, logging, and alerting to provide deep visibility into system health.

Filevine provides cloud-based workflow tools for legal professionals, helping them manage organizations and serve clients. They are recognized as a fast-growing and innovative technology company with a team of passionate professionals.

Americas EMEA Unlimited PTO

  • Design and implement highly scalable infrastructure for GitLab.com to support current and future growth.
  • Collaborate with cross-functional teams across the Infrastructure organization to plan and deliver projects that shape GitLab’s platform direction.
  • Operate and improve edge services and Kubernetes workloads, acting as a subject matter expert within the infrastructure department.

GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. They aim to enable everyone to contribute to and co-create the software that powers our world.

$121,210–$147,351/yr
Ireland Unlimited PTO

  • Design, operate, and scale storage based infrastructure systems across multiple tenancy models and public clouds.
  • Deepen our team’s expertise in relational databases, search, caching, queuing, and streaming.
  • Partner with Architecture, Release Engineering, Network, Compute, and Security teams.

Dbt Labs is the pioneer of analytics engineering, helping data teams transform raw data into reliable, actionable insights. They have grown from an open source project into the leading analytics engineering platform, and believe in empowering data practitioners.

$219,000–$245,000/yr
US Unlimited PTO

  • Architect, operate, improve and secure the platform the Garner Health app runs on
  • Boost development velocity and productivity
  • Build systems to a high engineering standard and hold others to the same high standard

Garner has developed a revolutionary approach to evaluating doctor performance and a unique incentive model that's reshaping the healthcare economy to ensure everyone can afford high quality care. They have more than doubled their revenue annually over the last 5 years. Garner's award winning culture is designed to cultivate teamwork, trust, autonomy, exceptional results, and individual growth.

Global

  • Automate infrastructure provisioning, configuration management, monitoring, and operational workflows using IaC and scripting languages.
  • Own the deployment, maintenance, and lifecycle management of systems supporting engineering, leveraging deep expertise in Kubernetes.
  • Troubleshoot complex infrastructure and application issues, driving root-cause analysis and developing long-term remediation solutions

SingleStore delivers the cloud-native database with the speed and scale to power the world’s data-intensive applications. They are venture-backed and headquartered in San Francisco with offices in Sunnyvale, Raleigh, Seattle, Boston, London, Lisbon, Bangalore, Dublin and Kyiv.

  • Contribute to our core product, primarily in Go, on services that power our applications.
  • Design and refine technical systems, helping to shape them to remain scalable, reliable, and elegant.
  • Collaborate closely across disciplines to explore problems, prototype ideas, and iterate quickly.

Humanitec is reshaping how enterprises build and run their cloud-native setups and helps teams build Internal Developer Platforms (IDPs) that unlock true developer self-service. They are a fully remote company where small teams work closely.

US

  • Ensure the smooth operation and high availability of Clarifai's core services
  • Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency
  • Design and implement scalable, secure, and cost-effective infrastructure solutions

Clarifai is a leading AI platform specializing in computer vision and generative AI, empowering organizations to transform unstructured data into actionable insights. Founded in 2013, they have a diverse, globally distributed team with $100M in funding and are committed to building a diverse and inclusive team.

Europe

  • Design, build, and scale systems, APIs, and tools for efficient software deployment and management.
  • Contribute to creating secure, reliable, and scalable software that enhances developer workflows and automates infrastructure capabilities.
  • Improve the overall efficiency and effectiveness of the development process.

Jobgether uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Their system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company.

US Canada

  • Design, build, and maintain our petabyte-scale data and ML platform.
  • Ensure reliability, security, scalability, and performance across our internal systems.
  • Automate deployment pipelines, monitoring, and alerting for ML and data services.

Serve Robotics is reimagining how things move in cities with its personable sidewalk robot designed to take deliveries away from congested streets.

US Canada Argentina India

  • Work with research teams to design and build our training infrastructure
  • Prototype new training frameworks and production-ize solutions at scale
  • Design, optimize and test model integration infrastructure

Clarifai is a leading AI platform specializing in computer vision, NLP, LLMs, and audio recognition, helping organizations transform unstructured data into structured data. Founded in 2013, they remotely operate across multiple countries with backing from industry leaders, fostering a diverse and equal opportunity workplace.

$167,249–$216,090/yr
US

  • Contribute to the design of a scalable cloud infrastructure platform on Google Cloud.
  • Develop and maintain infrastructure automation using Terraform and Kubernetes controllers.
  • Ensure cloud infrastructure adheres to best practices for security and compliance.

Virta Health is dedicated to reversing metabolic disease in one billion people. They innovate through technology, personalized nutrition, and virtual care, partnering with health plans, employers, and government organizations, with over $350 million raised from investors.

Europe

Heavily contribute to the architecture and migration of our CI/CD platform. Act as a pragmatic driver and senior contributor, responsible for designing and implementing solutions. Design and build the paved path as a product, ensuring they are reliable, secure, and well-documented.

Glia is the leading AI customer service solution for banks and credit unions offering AI and human agents across every voice and digital conversation.

Europe

  • Design and implement the "Golden Paths"—standardized, automated templates for microservices and infrastructure.
  • Develop the CLI tools, portals, or API interfaces that abstract the complexity of our cloud infrastructure.
  • Develop and maintain a library of modular, testable, and versioned Terraform modules.

SEON is a command center for fraud prevention and AML compliance, helping companies stop fraud, reduce risk and protect revenue. They are powered by real-time, first-party data signals, enriches customer profiles, flags suspicious behavior and streamlines compliance workflows.

Canada

  • Work with cutting edge infrastructure tools like Docker, Kubernetes, Terraform, Helm, and Istio
  • Accelerate development across the company with faster, safer, and more frequent deploys
  • Meaningfully improve developer happiness and productivity across the company with better development tools and workflows

Super.com helps people save more, earn more, and get more out of life. For employees, it is an opportunity to grow, make an impact, and unlock your full potential; they invest in learning, celebrate bold ideas, and create pathways for career growth.