Remote Devops Jobs · Kubernetes

Job listings

  • Design and implement infrastructure and tools that empower our product teams to rapidly and securely iterate, emphasizing reliability and automation.
  • Influence the strategic direction of our infrastructure and operational practices, ensuring that we are well-positioned to scale and support our growing organization.
  • Take a proactive role in the resolution of production issues, ensuring that we are well-prepared to handle incidents and that we learn from them in a blameless manner.

SSV Labs is the core team behind the SSV Network - pioneering decentralized infrastructure for Ethereum staking. They are building tools, protocols, and standards to make staking more secure, scalable, and trustless.

$60,000–$175,000/yr

  • Lead the push toward a modern, cloud-native organization by designing and managing scalable, resilient systems on AWS.
  • Own the Infrastructure as Code (IaC) strategy using Terraform, ensuring environments are repeatable, versioned, and stable.
  • Build and optimize high-velocity deployment pipelines using GitHub Actions, ArgoCD, and Helm to get code from "commit" to "production" seamlessly.

TrueML is undergoing a major platform rearchitecture, moving toward a fully cloud-native, modernized infrastructure. They seem to be a medium-sized company with a focus on innovation and providing engineers with the tools and data they need to make smart, impactful choices.

$198,025–$287,952/yr

  • Building tools and applications to extends Calendly’s infrastructure platform
  • Evaluating and deploying cloud native open source tools
  • Exercising expertise in cloud infrastructure concepts and patterns

Calendly's product powers connections for millions through impactful innovation. They are in the midst of exciting growth and desire people that want to learn, grow, and do their best work.

  • Assess and evolve Zipline's engineering practices to surface the highest-impact opportunities and co-develop a prioritized QE roadmap with engineering leads.
  • Strengthen CI/CD quality gates, including linting, security scanning (SAST/SCA), and test coverage checks, expanding coverage across pipelines while keeping builds fast and teams unblocked.
  • Build visibility into engineering health with quality dashboards that surface test coverage, failure rates, and deployment trends, paired with enhanced monitoring and alerting.

Zipline is a SaaS company transforming how frontline teams work, empowering leading brands across retail, healthcare, logistics, and more to connect, align, and inspire their employees from headquarters to the front lines. They are a fully remote company, with passionate employees across the U.S., Canada, and around the globe.

  • Design and evolve multi-provider, multi-region GPU compute clusters optimized for large-scale training.
  • Serve as the primary technical point of contact for customers running large-scale training workloads.
  • Build production-grade automation for cluster provisioning, GPU health checks, job scheduling, self-healing, and firmware/driver lifecycle management.

Andromeda Cluster gives early-stage startups access to scaled AI infrastructure. They work with leading AI labs, data centers, and cloud providers to deliver compute when and where it’s needed most and are expanding to find the brightest in AI infrastructure, research and engineering.

  • Design and build reusable platform solutions empowering engineering and SRE teams across AWS, Azure, and GCP.
  • Spearhead the evolution of our Packer-driven VM image pipelines, establishing standardized, maintainable processes.
  • Lead application migrations into GCP while rapidly mastering our complex, multi-cloud infrastructure footprint.

TELUS Agriculture and Consumer Goods (TAC) is committed to disrupting the status quo with state-of-the-art applications that leverage data to reimagine the way we approach food. TAC is composed of inspired individuals united in passion and purpose, working collaboratively to bring extraordinary opportunities to life.

US 6w PTO

  • Own the desktop execution platform: session lifecycle, remote access / execution, and integration with the AI pipeline
  • Build and evolve the remote desktop substrate (VNC/RFB, RDP) that connects agents to Windows sessions
  • Work deeply with ML-focused team members on bringing in additional context to recording and execution pipelines

Sola is transforming the way work is done by developing AI agents that make automation effortless, enabling users to record workflows and scale them up. They have raised $21M in funding from a16z, Conviction, and Y Combinator, and are looking for passionate engineers to help them scale.

  • Leading a team focused on designing, building, and evolving cloud-native, containerized infrastructure.
  • Driving complex technical initiatives and ensuring the availability, security, scalability, and reliability of our data ecosystem.
  • Guiding and developing engineering talent, setting priorities, driving execution, and partnering across teams.

Pismo, founded in 2016, provides a comprehensive processing platform for banking, card issuing and financial market infrastructure. Pismo has 500+ employees located in more than 10 countries around the world and was acquired by Visa in 2024.

  • Own the design, deployment and operation of OpenStack and Kubernetes environments.
  • Build and improve infrastructure using infrastructure-as-code and GitOps practices.
  • Optimise GPU workload scheduling using Kubernetes and NVIDIA tooling.

NexGen Cloud is building next-generation GPU cloud infrastructure, and is the company behind Hyperstack, a high-performance cloud platform designed for compute-intensive workloads. We're a scale-up by design, solving complex infrastructure challenges at pace, with real-world impact.