Source Job

US Unlimited PTO 20w maternity 20w paternity

  • Design, build, and maintain highly available Kubernetes infrastructure at scale.
  • Lead design for components and features, and contribute to architecture decisions for container orchestration.
  • Mentor engineers on Kubernetes best practices and drive initiatives to improve system reliability.

Kubernetes Docker Terraform Go AWS

20 jobs similar to Senior Software Engineer - Kubernetes Operations

Jobs ranked by similarity.

Global

  • Design and implement AI inference and training cloud products optimized for Kubernetes, including autoscaling and distributed jobs across GPU fleets.
  • Write clean, efficient Go code for Kubernetes controllers, operators, and custom resources supporting AI workloads.
  • Build APIs, CLIs, and developer tools to simplify deployment, lifecycle management, and monitoring of AI applications.

Gcore is a global provider of infrastructure and software solutions for AI, cloud, network, and security, powering digital experiences worldwide. With 550+ professionals and 210+ edge locations, the company collaborates with partners like Intel, NVIDIA, and Equinix to build the foundation for an AI-driven world.

US Unlimited PTO

  • Build and operate the delivery platform across AWS, EKS, ArgoCD, Helm, and Terraform, fixing production problems and driving root-cause analysis.
  • Standardize CI/CD pipelines using GitHub Actions and Azure DevOps, implement progressive delivery with Argo Rollouts, and build observability with Grafana and Prometheus.
  • Support platform adoption, reduce toil and cost, unblock cross-team delivery, and write documentation to eliminate knowledge silos.

Attain Finance is a leading consumer credit lender with over 50 years of expertise providing credit solutions across the U.S. and Canada. The company employs a dynamic team that fosters innovation and collaboration, with a portfolio including brands like Cash Money, LendDirect, Heights Finance, and others.

US Unlimited PTO

  • Provide frontline technical expertise to help developers deploy and scale Temporal in cloud-native environments.
  • Troubleshoot complex infrastructure issues, optimize performance, and develop automation solutions.
  • Collaborate with engineering and product teams to influence platform improvements and enhance developer experience.

Temporal provides an open source programming model that simplifies code and makes applications more reliable. The company is a growing team driven by values of curiosity, collaboration, and humility, focused on improving developer experience.

Global 16w maternity 16w paternity

  • Lead the design and implementation of self-service platform infrastructure for provisioning, deployment, and observability across engineering teams.
  • Evolve multi-tenant EKS foundations toward better reliability, security, scale, and multi-region connectivity.
  • Set delivery standards using Terraform, GitOps, and progressive rollout, while improving SLOs and alerting on Grafana Cloud.

Docker is a developer tooling company trusted by over 20 million monthly users and 20 billion container image pulls. They are a globally distributed, remote-first team building tools that define how software gets built and delivered.

Germany Unlimited PTO

  • Design and maintain scalable infrastructure-as-code solutions using Terraform and Kubernetes.
  • Build and operate observability systems while leading incident response and reliability improvements.
  • Embed security and compliance practices into infrastructure and optimize system performance and cloud costs.

This partner company builds a next-generation platform enabling AI-driven services across global employment infrastructure. It is a highly distributed, async-first organization where engineers thrive in ownership and autonomy.

US

  • Ensure reliability, availability, and observability for a large-scale cloud-based SaaS platform serving millions in education.
  • Design and maintain infrastructure-as-code and CI/CD pipelines while leading incident response and resolution.
  • Mentor peers and integrate AI-driven tools to improve SRE workflows and system performance.

Jobgether is an AI-powered job matching platform that connects candidates with hiring companies. The company manages the application process and uses AI to shortlist top-fitting candidates based on core requirements.

Global

  • Own and evolve Webshare's production infrastructure by leading migration from Docker Swarm to Kubernetes and maintaining high availability across hundreds of servers and ~50 services.
  • Drive observability, establish IaC practices, CI/CD pipeline reliability, and participate in on-call rotation alongside backend developers.
  • Contribute platform tooling to improve developer experience and reduce infrastructure toil, ensuring no silos and shared infrastructure ownership.

We develop cutting-edge proxy and web data scraping solutions for thousands of the world's best known businesses, including Fortune 500 companies. We are a team of 500+ professionals with a culture focused on growth, learning, and shared infrastructure ownership.

Argentina 18w maternity 12w paternity

  • Own and evolve the cloud platform including compute layer, EKS fleet, serverless infrastructure, networking, and cloud operations across AWS and GCP.
  • Design and maintain infrastructure-as-code foundation and networking layer for reliability, security, and scalability.
  • Build AI-powered automation for cloud infrastructure management, including policy-as-code, drift detection, and LLM-assisted runbook generation.

Webflow builds the world's leading AI-native Digital Experience Platform, empowering teams to design, launch, and optimize for the web without barriers. As a remote-first company with over 2 million users across 190 countries, it fosters a culture of trust, transparency, and creativity.

US

  • Design, deploy, and manage production Kubernetes clusters with workload scheduling, resource quotas, network policies, and RBAC.
  • Build and optimize CI/CD pipelines using Infrastructure as Code and GitOps principles.
  • Implement observability solutions using Prometheus, Grafana, and OpenTelemetry for performance tuning and reliability.

VerTALENTS is a subsidiary of VerSprite Cybersecurity, specializing in technology staffing. The company connects top technical talent with industry clients through various methods, adding value to both clients and candidates for full-time and contracting opportunities.

Latin America

  • Design and maintain CI/CD processes and infrastructure as code using tools like Terraform and Kubernetes.
  • Troubleshoot and resolve issues across dev, testing, and production environments.
  • Work with high-growth technology clients to scale applications and improve operational practices.

Bluelight is a leading software consultancy designing and developing innovative technology to enhance users' lives. With a presence across the United States and Central/South America, it fosters a collaborative and enriching work environment where each team member can grow and thrive.

United States

  • Design and build core platform infrastructure for large-scale cloud-native data and analytics systems.
  • Own and improve CI/CD pipelines, testing frameworks, and deployment in a high-scale PaaS environment.
  • Contribute to reliability engineering, observability, and operational excellence across distributed systems.

Jobgether uses an AI-powered matching process to connect candidates with roles. The company is a growing platform focused on efficient job matching and data privacy compliance.

US 3w PTO

  • Design and operate AWS infrastructure and hybrid connectivity.
  • Stand up and run production-grade Kubernetes clusters on EKS, Rancher, or OpenShift.
  • Implement GitOps workflows with Argo CD and author Helm charts.

BlackSky is a real-time intelligence company that provides satellite imagery and analytics. They have a global team and a culture that is people-first, customer-focused, and fun.

US Unlimited PTO

  • Configure, deploy, and maintain security tools across cloud-native environments.
  • Integrate security tooling into existing software development and deployment workflows.
  • Partner with engineering teams to implement security best practices throughout the software development lifecycle.

Sphinx builds modern, scalable software to solve complex national security problems in Space. Founded by engineers and technologists with deep experience across commercial and defense technology, they emphasize collaboration, transparency, and individual responsibility in a growing team.

Global Unlimited PTO

  • Lead the architecture and implementation of managed Kubernetes infrastructure across AWS, Azure, and GCP for enterprise customer deployments.
  • Own the systems that provision, organize, and manage cloud accounts, including resource governance and multi-tenant isolation.
  • Mentor P3/P4 engineers and define architectural patterns that scale across the company's infrastructure.

Ditto builds the world's leading edge sync platform, enabling applications to share data peer-to-peer with or without internet connectivity. With over $145 million in funding and trusted by organizations like Chick-fil-A and Delta Airlines, Ditto is a fast-growing, globally distributed startup committed to building a diverse and inclusive team.

US

  • Designing and managing cloud-based infrastructure on AWS.
  • Creating and maintaining deployment architectures and continuous delivery pipelines.
  • Automating infrastructure provisioning and management using Infrastructure as Code (IaC) tools such as Terraform or CloudFormation.

Nearform is an independent team of data & AI experts, engineers, and designers who build intelligent digital solutions and capability at pace. Our team of 500 experts in 20+ countries is trusted by leading enterprises.

North America 6w PTO 26w maternity 26w paternity

  • Lead and mentor a team of Forward Deployed Engineers deploying the North platform.
  • Drive end-to-end deployment in private cloud and on-premises environments for customer success.
  • Collaborate with Product, Engineering, and Sales while optimizing cloud infrastructure and K8s services.

Cohere is a security-first enterprise AI company building cutting-edge foundation AI models and end-to-end products for real-world business problems. They are a global technology company with offices in Toronto, San Francisco, London, New York City, Montreal, Seoul, Germany, and Paris, employing a team of researchers, engineers, and designers.

Global 16w maternity 16w paternity

  • Author and maintain image definition files tracking upstream OSS releases and keeping our catalogue current across dozens of images.
  • Adapt upstream Helm charts (e.g., cert-manager, grafana, mongodb) to work with security-hardened images, handling security constraints and Kubernetes compatibility.
  • Write Go-based integration tests that validate images and charts behave correctly in real Kubernetes environments.

Docker is a leading developer tooling brand trusted by over 20 million monthly users and 20 billion container image pulls, providing products like Docker Desktop, Docker Hub, and Docker Scout. The company is a globally distributed, remote-first team with offices in Seattle and Paris, focused on building tools for software delivery.

US

  • Lead global product marketing strategy, including value proposition, messaging, and competitive differentiation.
  • Present at customer executive briefings and industry events, and develop sales enablement materials.
  • Produce content such as eBooks, blog posts, webinars, and partner collateral.

RapidFort provides cloud-native security solutions, focusing on Kubernetes, container security, and DevSecOps. The company is a fast-growing, cutting-edge, high-energy team that values cross-functional collaboration and innovation.

United States Unlimited PTO

  • Possess strong proficiency with Kubernetes, Helm, and containerized applications for robust platform deployment.
  • Deploy and manage UDS environments across AWS EKS, Azure AKS, and on-prem for government-backed vendors.
  • Serve as the primary technical interface for customers, providing guidance and troubleshooting on complex UDS issues.

Defense Unicorns delivers secure software solutions for continuous integration and delivery, focusing on mission-value for government customers. The team is composed of innovators, software engineers, and veterans with decades of experience across the federal market.

Canada Unlimited PTO

  • Solve complex challenges as a trusted technical lead driving a DevOps mindset.
  • Develop and enhance cloud reference architectures and guide teams on implementation.
  • Participate in architectural decisions and build secure, high-performing cloud infrastructure.

Kinaxis is a global leader in modern supply chain orchestration, powering complex global supply chains. With over 2000 employees worldwide and 6 global offices, we are proud to have won several Top Employer awards and foster a culture of innovation and collaboration.