Remote Devops Jobs · Kubernetes

Job listings

Europe 6w PTO

  • Operate and evolve multi-cloud streaming clusters and related database infrastructure, diagnosing and eliminating cross-layer failure modes.
  • Define and evolve the technical direction for operating shared database systems at scale, leading complex initiatives and reliability investments.
  • Mentor and support engineers, improve systems toil with automation, and partner with database and platform teams to align on strategy.

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, featuring scalable metrics, logs, and traces and thrive in an innovation-driven environment where transparency, autonomy, and trust fuel everything.

  • Own and drive the design, deployment, and operation of OpenStack and Kubernetes clusters optimised for GPU workloads
  • Lead and develop a team of 4–5 infrastructure engineers, setting clear direction and standards
  • Build and improve infrastructure through automation (IaC, GitOps, CI/CD pipelines)

NexGen Cloud is a fast-growing company building next-generation GPU cloud infrastructure. At the core of NexGen Cloud is a team of curious, driven people who care deeply about quality, ownership and collaboration.

  • Design and maintain robust ML deployment pipelines to ensure seamless model delivery.
  • Automate model training, deployment, and monitoring workflows to increase operational efficiency.
  • Collaborate closely with Data Scientists and Engineering teams to integrate models into production environments.

Truelogic is a leading provider of nearshore staff augmentation services, headquartered in New York. With over 600+ highly skilled tech professionals based in Latin America, they drive digital disruption by partnering with U.S. companies on their most impactful projects.

  • Design, implement, and maintain cloud-based infrastructure and services at the intersection of agentic AI and biomedical data.
  • Collaborate with software engineers, data engineers, researchers and data scientists to understand their needs and implement solutions that enhance their productivity.
  • Build and lead a high-performing platform engineering team, setting a high bar for technical excellence, ownership, and accountability in the organization.

Owkin is an AI company on a mission to solve the complexity of biology. They are building the first Biology Super Intelligence (BASI) by combining powerful biological large language models, multimodal patient data, and agentic software.

$145,000–$170,000/yr
US Unlimited PTO 12w maternity 12w paternity

  • Learn platform infrastructure, developer tooling, and deployment patterns.
  • Own and drive at least one architecture decision that improves platform reliability.
  • Ship infrastructure improvements that measurably improve developer experience or platform stability.

Homebot is a homeownership platform for lenders and real estate, title & insurance agents that drives client retention and partner referrals. They maintain a clear focus on culture, engagement, and creating an environment where people are valued and can thrive.

$205,000–$220,000/yr

  • Partner with Sales and Field Engineering to design and architect complex, enterprise-grade solutions tailored to customer needs.
  • Lead the implementation of custom solutions within customer environments across multi-cloud and hybrid architectures.
  • Optimize solutions for performance, scalability, and reliability in production environments.

Striim is a unified data integration and streaming platform that connects clouds, data, and applications. We believe and expect all of our employees to operate as one with unlimited potential and dignity.

$95,189–$116,383/yr
Global Unlimited PTO

  • Partner with product and platform engineering teams to improve system reliability, scalability, and developer experience
  • Build, maintain, and evolve CI/CD pipelines to support safe, fast, and reliable deployments
  • Improve observability through better monitoring, alerting, logging, and telemetry

Zipline is a SaaS company transforming how frontline teams work. They empower leading brands across retail, healthcare, logistics, and beyond. Zipline is a fully remote company with employees across the U.S., Canada, and around the globe.

  • Support the availability and durability of critical services across production environments.
  • Develop automation for common operational tasks, reducing manual intervention and toil.
  • Partner with engineering, product, and operations teams to support resilient system design and operations.

Backblaze is the object storage leader in the open cloud movement, fueling customer success with cloud storage built purposefully to unlock budgets and unleash innovators. Founded in 2007, they scaled the business with less than $3 million in outside funding until 2021, and generate over $100m in revenue managing over three billion gigabytes of data storage for 500K+ customers in 175+ countries.

Europe 4w PTO

  • Design, migrate, and operate cloud-native platforms across AWS, GCP, and OCI.
  • Build and maintain Infrastructure as Code using Terraform across multiple cloud providers.
  • Apply SRE best practices to improve availability, performance, and reliability (SLIs/SLOs, monitoring, alerting).

Apriorit is a software engineering company established in 2002, with experience in system programming, cybersecurity, reverse engineering, SaaS/Web, blockchain-based solutions, and AI. With over 400 specialists, they help tech companies around the world turn their challenging ideas into secure and viable products.

  • Build Self-Service Infrastructure: Design and scale highly available Infrastructure as Code (IaC) modules using Terraform. Empower development teams to provision resources autonomously and securely.
  • Champion Platform Reliability: Partner closely with engineering teams to define, measure, and operationalize SRE metrics. Balance feature velocity with system stability.
  • Elevate Developer Experience (DevEx): Architect frictionless, GitOps-driven CI/CD pipelines utilizing GitHub Actions and ArgoCD. Facilitate automated, secure, and progressive deployments.

KTO Group drives excitement in iGaming through innovation, focusing on transparency and player satisfaction. Founded in 2018, KTO blends sports betting with online casino entertainment on a proprietary platform, and is a rising leader in LATAM, ranked among Brazil’s top 10 iGaming brands.