Source Job

Germany

  • Build and maintain end-to-end observability with ELK, Prometheus, and Grafana.
  • Own and improve CI/CD pipelines (CircleCI, GitLab CI, GitHub Actions, ArgoCD).
  • Lead incident response and postmortems in a blameless culture.

GCP Kubernetes Terraform CI/CD

20 jobs similar to Senior Site Reliability Engineer

Jobs ranked by similarity.

Europe

  • Design and operate our Kubernetes ecosystem with a focus on high availability and zero-downtime operations.
  • Own and evolve our PaaS strategy, using GitOps and CI/CD to empower domain teams to deploy independently.
  • Define and implement our observability strategy across metrics, logs, and tracing.

Finom is a European tech startup headquartered in Amsterdam, revolutionizing financial services for entrepreneurs. They offer an all-in-one financial B2B solution integrating banking, accounting, financial management, and invoicing into a mobile-first platform, with about 346 million in funding.

$29,000–$36,000/yr
India

  • Design, build, and maintain scalable, reliable systems on GCP.
  • Develop automation for infrastructure provisioning using Terraform, Ansible, or Deployment Manager.
  • Manage incident response, conduct postmortems, and implement improvements to reduce recurrence.

SupplyHouse.com is an industry-leading e-commerce company specializing in HVAC, plumbing, heating, and electrical supplies since 2004. They value every individual team member and cultivate a community where people come first with Generosity, Respect, Innovation, Teamwork, and GRIT.

$160,000–$200,000/yr
US

  • Drive the stability and reliability of Epic's GCP infrastructure.
  • Manage and harden our Docker and GKE container platform.
  • Maintain and improve CI/CD pipelines.

Epic is the leading digital reading platform for kids ages 12 and under, used by millions of children, families, and educators around the world. As Epic continues to grow, we are reimagining what reading can be through thoughtful technology, data, and global collaboration to make learning more engaging, accessible, and impactful.

$188,550–$212,150/yr
Global Unlimited PTO

  • Own the technical direction of Remote's SRE/Platform domain.
  • Define and drive the reliability strategy across the platform.
  • Identify and lead AI enablement initiatives across the engineering organisation.

Remote is solving modern organizations’ biggest challenge – navigating global employment compliantly with ease. With our core values at heart and a future-focused work culture, our team works tirelessly on ambitious problems, asynchronously, around the world.

$115,200–$172,800/yr
US 8w paternity

  • Build internal tooling to help other engineers and the rest of the company understand and operate our system.
  • Design and implement security best practices for our team and infrastructure.
  • Reduce toil through automation, including building and maintaining CI/CD infrastructure.

Openly is rebuilding insurance from the ground up by re-envisioning and enhancing every aspect of the customer experience. They are a rapidly growing team of exceptional, curious, empathetic people with a wide range of skill sets, spanning many departments.

Unlimited PTO

  • Assess and improve visibility by identifying gaps in dashboards, metrics, and logs.
  • Refine alerts and dashboards for critical services to catch issues earlier.
  • Automate routine checks and monitoring tasks to free up engineers.

PlayOn is where high school sports come to life through platforms like GoFan, NFHS Network, and MaxPreps. As a growth-stage company backed by KKR, we build the technology that powers high school athletics from ticketing and streaming to fundraising and merchandise.

Europe

  • Design and implement a cloud-native platform architecture on GCP
  • Build scalable guardrails for multi-team, multi-environment setups with compliance requirements
  • Create reusable infrastructure that enables self-service provisioning

InPost Group is an innovative European out of home deliveries company, revolutionizing the way parcels are delivered to customers. With over 10,000 employees worldwide, InPost Group is one of the largest out of home delivery providers in Europe, committed to providing sustainable and efficient delivery solutions.

Germany 6w PTO

  • Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
  • Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
  • Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, and their team thrives in an innovation-driven environment.

Brazil Unlimited PTO

  • Collaborate with a tight-knit development team.
  • Design, deploy, and operate critical systems balancing reliability, cost, and agility.
  • Perform troubleshooting and root-cause analysis of system operation issues.

Loadsmart is a logistics technology company valued at over $1 billion. We are a collection of industry veterans and user-centered engineers using innovative technology to fearlessly reinvent the future of freight.

US

  • Responsible for overall health, availability, performance, security, cost and day-to-day operations of the GCP platform and toolset.
  • Build and maintain Azure DevOps pipelines for infrastructure and application deployment.
  • Design, implement, maintain, operate GCP infrastructure across DEV, QA, STAGE, PROD etc.

Resultant is a consulting firm that helps clients make technology a strategic asset and use data to guide better decisions. They employ over 350 team members who operate remotely and from offices and hubs around the United States.

SRE

Fal
$180,000–$250,000/yr
US

  • Own and operate our Kubernetes infrastructure.
  • Build and maintain CI/CD pipelines and deployment infrastructure.
  • Leverage AI to automate analysis and resolution of production issues.

Fal is the generative media ecosystem powering the next generation of AI products. They build the infrastructure, tools, and model access that teams need to move from idea to production.

Brazil

  • Maintain and optimize AWS EC2 and EKS clusters to ensure high availability and performance.
  • Lead troubleshooting of production outages, providing timely resolution and root cause analysis.
  • Implement and improve CI/CD pipelines using tools like Jenkins and GitHub Actions to streamline deployment processes.

CI&T are tech transformation specialists uniting human expertise with AI to create scalable tech solutions. With over 8,000 CI&Ters globally, they have built partnerships with more than 1,000 clients over 30 years, and Artificial Intelligence is deeply embedded in their work reality.

  • Design and implement Kubernetes based Platform as a Service offering to Sovereign Cloud.
  • Co-create next-generation cloud solutions, with the focus on EU healthcare, security and public sector area.
  • Responsible for entire lifecycle of Continuous Integration/Continuous Deployment pipelines and platform as Code approaches.

Deutsche Telekom IT Solutions Slovakia entered the life of Košice region in 2006 and has been inextricably linked with the region. They have managed to grow to the second largest employer in the eastern part of the country with more than 3900 employees.

Mexico

  • Design systems with resilience, graceful degradation, and capacity in mind.
  • Define and measure SLOs and SLIs that actually reflect what our customers feel.
  • Use Datadog (logging, metrics, APM) together with CloudWatch to build signal-heavy, noise-light observability.

EarnIn is building products that deliver real-time financial flexibility for those with the unique needs of living paycheck to paycheck. They are growing fast and are excited to continue bringing world-class talent onboard to help shape the next chapter of their growth journey.

Global Unlimited PTO

  • Own and evolve CI/CD pipelines using GitHub Actions and OIDC-based authentication for microservices and agentic workloads.
  • Automate infrastructure provisioning using Infrastructure as Code tools such as Terraform and CloudFormation.
  • Operate and scale our Kubernetes platform, including autoscaling, ingress, and multi-tenant isolation for enterprise customers.

Zingtree is a next-generation intelligent process automation platform reimagining customer experience operations for enterprise support leaders. It is a small team with high ownership, emphasizing automation, collaboration, and transparency.

Europe

  • Designs, develops, tests and implements infrastructure for CI/CD pipelines and IaC.
  • Manages source code, configuration management, release management, build and deployment activities.
  • Consults and implements new innovative technologies to satisfy innovation strategy.

Deutsche Telekom IT Solutions Slovakia entered the life of Košice region in 2006. They are the second largest employer in the eastern part of the country with more than 3900 employees, providing innovative information and communication technology services.

  • Lead and mentor SRE/DevOps engineers, driving team growth and performance
  • Ensure system reliability, uptime, and performance across production systems
  • Implement DevOps and SRE best practices with a focus on automation and scalability

InspiredXpert is a specialist IT Talent Solutions company providing high-quality contract or perm talent across software development, cloud, AI, cybersecurity, and data-driven roles. We connect skilled professionals with innovative companies, offering exciting opportunities to work on impactful projects across the globe.

US Unlimited PTO

  • Support the Platform Infrastructure by managing container environments on EKS, implementing GitOps workflows, and maintaining CI/CD pipelines.
  • Build for Reliability by defining SLIs/SLOs, leading incident response, and contributing to disaster recovery planning.
  • Drive Observability by designing and maintaining monitoring and logging stacks with Datadog, Sentry, and CloudWatch.

Turquoise Health is a Series C price transparency platform for finance leaders across healthcare, building the infrastructure for a more open, efficient healthcare marketplace. The company is a remote-first, US-based team of over 300 enterprise organizations that values transparency, empathy, inclusivity, creativity, and ownership.

US Unlimited PTO

  • Design, build, and maintain secure CI/CD pipelines supporting cloud-native applications and services.
  • Implement Infrastructure as Code using tools such as Terraform to provision and manage cloud resources.
  • Integrate security controls and best practices into the software development lifecycle (DevSecOps).

540 is a forward-thinking company that the government turns to in order to #getshitdone. They break down barriers, build impactful technology, and solve mission-critical problems.

Europe

  • Own, maintain, and improve CI/CD pipelines and internal delivery tooling.
  • Design and develop reusable Jenkins and GitLab pipeline templates and automation frameworks.
  • Drive automation across deployments, upgrades, configuration management, and technical documentation.

Everbridge offers a SaaS-based platform for critical event management, helping to aggregate and assess threat data, locate people, automate communications, and track response plans. They have over 1300 employees worldwide supporting over 6000 global customers.