Source Job

US Canada

  • Own and evolve AWS infrastructure using Terraform, managing EKS clusters, databases, and core services.
  • Maintain CI/CD reliability and developer tooling across the full engineering org.
  • Lead incident response, drive post-incident reviews, and improve monitoring and alerting standards.

Terraform AWS Kubernetes Ruby On Rails Observability

20 jobs similar to Staff Engineer, Site Reliability

Jobs ranked by similarity.

US 5w PTO

  • Design and develop CI/CD systems for websites, services, and release workflows, and operate an EKS-based Kubernetes platform.
  • Diagnose debug production incidents, drive root-cause analysis, and implement improvements to enhance system reliability.
  • Write and maintain infrastructure as code using Pulumi or Terraform/OpenTofu across multiple AWS accounts with security-conscious practices.

Thunderbird is one of the world’s most trusted open-source email applications, empowering more than 20 million people globally. Our small but growing distributed team includes 65+ people across seven countries, and we build privacy-respecting communication tools with a collaborative, inclusive, and user-first spirit.

$188,550–$212,150/yr
Global Unlimited PTO

  • Own the technical direction of Remote's SRE/Platform domain.
  • Define and drive the reliability strategy across the platform.
  • Identify and lead AI enablement initiatives across the engineering organisation.

Remote is solving modern organizations’ biggest challenge – navigating global employment compliantly with ease. With our core values at heart and a future-focused work culture, our team works tirelessly on ambitious problems, asynchronously, around the world.

$113,850–$126,500/yr
Europe 5w PTO

  • Design, build, and maintain infrastructure using Infrastructure as Code tools such as Terraform.
  • Improve system reliability, scalability, resilience, and performance across the Mast platform.
  • Build systems and tooling that automate infrastructure management and operational workflows wherever possible.

Mast is on a mission to make complex lending simple by building modern, cloud-native lending technology purpose-built for specialist lenders. It is a high-performance team of engineers and lending experts that values radical honesty, transparency, and speed.

$160,000–$190,000/yr
US

  • Own and evolve Launch Potato's cloud infrastructure, CI/CD platform, and compliance posture.
  • Build the SRE function from the ground up so product teams can ship faster without compromising reliability, security, or cost control.
  • Stand up the SRE practice from scratch: on-call rotation, PagerDuty configuration, SLA/SLO definitions for core infrastructure services, runbook library, and observability dashboards that tie site performance to business metrics.

Launch Potato is a digital media company that connects consumers with leading brands through data-driven content and technology. They are headquartered in South Florida with a remote-first team spanning over 15 countries, with a high-growth, high-performance culture.

Global

  • Deploy and maintain infrastructure using Terraform on AWS.
  • Operate and govern production-grade platforms running on Kubernetes / EKS.
  • Build and maintain CI/CD pipelines using GitHub Actions.

Muttdata is a dynamic startup committed to crafting innovative systems using cutting-edge Big Data and Machine Learning technologies. They are looking for a hands-on DevOps to join a strategic initiative focused on deploying and operating Data & AI platforms.

$4,313–$5,391/mo
Europe

  • Own 5 AWS accounts across the organisation.
  • Architect and maintain infrastructure as code with Terraform.
  • Set up monitoring, alerting, and incident response.

We're a UK fintech building high-throughput digital infrastructure for the mortgage and property space. Recently acquired Trussle and we are taking our platform to the next level. The company values innovation and building high-quality products.

Americas 7w PTO

  • Act as a first responder for system incidents and outages, ensuring high availability and performance.
  • Own and evolve monitoring, alerting, and log management systems while optimizing database infrastructure.
  • Collaborate with engineering teams to build scalable, resilient systems and contribute to SRE tooling and automation.

Circle is building the world's leading all-in-one platform for online communities. We're a fully remote company of around 200 team members from 30+ countries, with a culture that values autonomy, async collaboration, and high expectations.

$210,000–$278,000/yr
US Unlimited PTO

  • Architect future iterations of core systems, addressing scaling requirements.
  • Design and implement developer tools to enhance deployment safety and reproducibility.
  • Drive excellence in monitoring and guide incident response for quick issue resolution.

Found provides tools for self-employed individuals, offering a business bank account that automates taxes and expense tracking. They aim to give self-employed people the security and peace of mind historically available only at large corporations and are looking for kind, resourceful, and passionate people.

$29,000–$36,000/yr
India

  • Design, build, and maintain scalable, reliable systems on GCP.
  • Develop automation for infrastructure provisioning using Terraform, Ansible, or Deployment Manager.
  • Manage incident response, conduct postmortems, and implement improvements to reduce recurrence.

SupplyHouse.com is an industry-leading e-commerce company specializing in HVAC, plumbing, heating, and electrical supplies since 2004. They value every individual team member and cultivate a community where people come first with Generosity, Respect, Innovation, Teamwork, and GRIT.

US

  • Designing and managing cloud-based infrastructure on AWS.
  • Creating and maintaining deployment architectures and continuous delivery pipelines.
  • Automating infrastructure provisioning and management using Infrastructure as Code (IaC) tools such as Terraform or CloudFormation.

Nearform is an independent team of data & AI experts, engineers, and designers who build intelligent digital solutions and capability at pace. Our team of 500 experts in 20+ countries is trusted by leading enterprises.

Germany

  • Build and maintain end-to-end observability with ELK, Prometheus, and Grafana.
  • Own and improve CI/CD pipelines (CircleCI, GitLab CI, GitHub Actions, ArgoCD).
  • Lead incident response and postmortems in a blameless culture.

Redcare Pharmacy is Europe’s No.1 e-pharmacy, powered by passionate teams and cutting-edge innovation. They strive to create a healthy, collaborative work environment where every employee feels valued and inspired to contribute to their vision “Until every human has their health”.

$145,000–$250,000/yr
US Unlimited PTO

  • Construct infrastructure as code, developing and enforcing best practice across configurations while preventing drift between Terraform configurations and infrastructure deployments.

SentiLink provides innovative identity and risk solutions, empowering institutions and individuals to transaction with confidence. They are building the future of identity verification in the United States replacing a clunky, ineffective, and expensive status quo with solutions that are 10x faster, smarter, and more accurate.

  • Maintain and develop secure, reliable, and scalable AWS cloud infrastructure to meet business and development needs.
  • Deploy and operate microservices running on EC2 (Docker Compose + Caddy) and Kubernetes (EKS + Karpenter).
  • Write and maintain Terraform modules and stacks for EC2, RDS, EKS, ECR, S3, IAM, VPC, and Secrets Manager.

INFUSE is a digital marketing company headquartered in the US and operating worldwide, providing services in demand generation. Our team is dispersed across 20 countries, and we are committed to giving each candidate a fair and detailed assessment.

UK

  • Design, build, and maintain CI/CD pipelines and Infrastructure as Code using tools like CloudFormation, Ansible, and Terraform.
  • Monitor and respond to infrastructure and application health, troubleshoot operational issues, and provide on-call support.
  • Maintain operational documentation, communicate proactively with teams, and ensure service delivery meets client expectations.

NICE Ltd. provides software used by 25,000+ global businesses, including 85 of the Fortune 100, to deliver customer experiences, fight financial crime, and ensure public safety. With over 8,500 employees across 30+ countries, NICE is recognized as a market leader in AI, cloud, and digital innovation.

$115,200–$172,800/yr
US 8w paternity

  • Build internal tooling to help other engineers and the rest of the company understand and operate our system.
  • Design and implement security best practices for our team and infrastructure.
  • Reduce toil through automation, including building and maintaining CI/CD infrastructure.

Openly is rebuilding insurance from the ground up by re-envisioning and enhancing every aspect of the customer experience. They are a rapidly growing team of exceptional, curious, empathetic people with a wide range of skill sets, spanning many departments.

Global 16w maternity 16w paternity

  • Lead the design and implementation of self-service platform infrastructure for provisioning, deployment, and observability across engineering teams.
  • Evolve multi-tenant EKS foundations toward better reliability, security, scale, and multi-region connectivity.
  • Set delivery standards using Terraform, GitOps, and progressive rollout, while improving SLOs and alerting on Grafana Cloud.

Docker is a developer tooling company trusted by over 20 million monthly users and 20 billion container image pulls. They are a globally distributed, remote-first team building tools that define how software gets built and delivered.

US 5w PTO

  • Build and maintain the platform that runs all Close systems.
  • Automate database lifecycles and eliminate static credentials.
  • Improve our multi-region disaster recovery system and reduce downtime.

Close is a bootstrapped, profitable, and fully remote company with a team of thoughtful individuals. They focus on building a CRM that prioritizes better communication for small scaling businesses and have about 100 employees.

Canada Unlimited PTO

  • Design, build, and operate distributed systems powering observability across ClickHouse Cloud.
  • Own reliability, performance, and cost-efficiency of the telemetry pipeline and storage systems.
  • Take part in on-call rotation and drive root-cause resolution and long-term fixes.

ClickHouse is a real-time analytics and data warehousing company recognized on the 2025 Forbes Cloud 100 list. With over 3,000 customers and rapid growth, the company fosters an innovative and fast-paced culture.

US

  • Design, deploy, and manage production Kubernetes clusters with workload scheduling, resource quotas, network policies, and RBAC.
  • Build and optimize CI/CD pipelines using Infrastructure as Code and GitOps principles.
  • Implement observability solutions using Prometheus, Grafana, and OpenTelemetry for performance tuning and reliability.

VerTALENTS is a subsidiary of VerSprite Cybersecurity, specializing in technology staffing. The company connects top technical talent with industry clients through various methods, adding value to both clients and candidates for full-time and contracting opportunities.

Canada

  • Own and operate production cloud environments, ensuring high availability, reliability, and performance across distributed systems.
  • Design, build, and maintain scalable infrastructure using automation-first principles and Infrastructure as Code practices.
  • Drive automation initiatives and continuous improvement across infrastructure, deployment, and operational workflows.

Jobgether is an AI-powered job matching platform that connects candidates with hiring companies. They have an inclusive, employee-driven culture with a strong focus on collaboration and innovation.