Source Job

$140,000–$180,000/yr
US

  • Conducting current-state infrastructure mapping across application, platform, and hosting layers, documenting and recommending improvements.
  • Performing dependency and integration analysis across interconnected enterprise systems.
  • Identifying single points of failure and systemic reliability risks.

Enterprise Architecture Cloud Migration Stakeholder Communication

20 jobs similar to Sr. Site Reliability Engineer

Jobs ranked by similarity.

$172,614–$172,614/yr
US

  • Design infrastructure, networking, and software platform architecture.
  • Build and maintain automation of Continuous Integration and Continuous Deployment pipelines.
  • Troubleshoot infrastructure, internal applications, networking, and security issues.

Loadsmart is a technology company focused on the logistics and supply chain industry. They leverage data and technology to automate and optimize freight transportation, connecting shippers and carriers to streamline the shipping process. They are a mid-sized company passionate about transforming the future of freight.

$120,000–$180,000/yr
US

  • Develop automation code to provision and operate infrastructure at scale.
  • Build resilient, scalable, secure, and observable services with cost optimization.
  • Proactively identify and address security concerns across systems and infrastructure.

Globality uses AI to transform enterprise spending into a more efficient and inclusive process. They aim to revolutionize enterprise procurement with AI and have a culture built on trust, collaboration, and innovation, fostering an environment where every individual feels valued and included.

US Unlimited PTO

  • Define long-term architectural strategy for multi-cloud compute and traffic platforms.
  • Provide mentorship to engineers through design reviews and code contributions.
  • Partner with Security to build ‘secure by default’ systems.

Temporal Technologies develops an open-source programming model that simplifies code and enhances application reliability. With a focus on developer experience and open-source software, they foster a culture of curiosity, collaboration, and genuine impact.

$165,000–$200,000/yr
US Unlimited PTO

  • Contribute to building and operating the infrastructure that supports the HackerOne platform.
  • Improve the reliability, security, and scalability of our systems.
  • Design and operate highly available cloud systems and apply best practices for reliability, observability, and security.

HackerOne is a global leader in Continuous Threat Exposure Management (CTEM). The HackerOne Platform unites agentic AI solutions with the ingenuity of the world’s largest community of security researchers to continuously discover, validate, prioritize, and remediate exposures across code, cloud, and AI systems. They combine the ingenuity of the largest security research community with a best-in-class AI-powered platform, trusted by the world’s top organizations.

US

  • Set the vision and drive execution for Reliability Engineering at Affirm
  • Build software and program management structure to perform continual risk management across the entire Affirm system and Engineering organization
  • Hire and build a global team of SREs, system engineers, and full stack engineers

Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest. They seem to be a remote-first company with competitive benefits that are anchored to their core value of people come first.

$160,000–$200,000/yr
US

  • Help drive reliability, automation and performance within our cloud-based infrastructure.
  • Become embedded within an Engineering team helping them navigate production excellence and advocate for best practices.
  • Debug production issues across services and levels of the stack as well as practice incident response and blameless postmortems.

Flywire is a global payments enablement and software company that was founded over a decade ago. They have over 1,200 global FlyMates, representing more than 40 nationalities, in 12 offices worldwide, and are looking for people to join the next stage of their journey as they continue to grow.

Europe

  • Design and maintain scalable, fault-tolerant infrastructure that supports our SaaS platform and keeps pace with business growth.
  • Define, document, and maintain SLIs, SLOs, and SLAs in partnership with product engineering, translating business commitments into technical guardrails.
  • Lead incident response with steady judgment, facilitate blameless postmortems, and drive remediation efforts that prevent recurrence.

Fixify is on a mission to reimagine IT teams support companies. They need a Senior Site Reliability Engineer who finds joy in building systems that fade into the background, empowering product engineers to ship with confidence and their customers to work without interruption.

US Unlimited PTO

  • Design, build, and maintain scalable infrastructure and tooling that improves reliability, performance, and availability across OnePay’s platform
  • Contribute to the evolution of our observability stack, platform libraries, cloud architecture, and CI/CD pipelines
  • Develop automation and monitoring systems to detect, prevent, and remediate incidents before they impact customers

OnePay is a consumer fintech company trusted by millions of Americans to make money better, providing an all-in-one financial services platform. Backed by Walmart and Ribbit Capital, OnePay provides banking, savings, credit cards, lending, investing, and crypto services and embedded financial services to frontline workers.

Unlimited PTO

  • Build and operate cutting-edge cloud infrastructure to support Diagrid's core products
  • Define standards, deliver tools, processes, and frameworks to make our products secure, reliable, efficient, and highly available
  • Build and maintain CI/CD pipelines that enable delivering software quickly and securely across clouds

Diagrid believes that open-source software, open standards and APIs are the greatest transformational tools for organizations. They provide developers with APIs and tools that help them focus on their code and not on infrastructure and are founded by the creators of the Dapr and KEDA open-source projects.

LATAM

  • Monitor production systems, dashboards, logs, and alerts to ensure high availability and performance across distributed environments.
  • Assist in incident detection, triage, escalation, and resolution, following structured on-call rotations with mentorship support.
  • Maintain, follow, and continuously improve runbooks, operational procedures, and incident response workflows.

Jobgether is a platform that helps job seekers find the right opportunities. They use an AI-powered matching process to ensure applications are reviewed quickly and fairly.

$140,000–$180,000/yr
Americas Unlimited PTO 16w maternity

  • Build and scale infrastructure to support billions of messages per day and real-time events
  • Automate deployments, alerting, and incident response
  • Tune MySQL and other datastore performance and improve reliability across distributed systems

Customer.io's platform enables over 8,000 companies, from scrappy startups to global brands, to send billions of automated emails, push notifications, in-app messages, and SMS every day. They foster a culture that values empathy, transparency, and responsibility.

$141,000–$230,000/yr
US

  • Collaborate with engineering teams to design and implement scalable, secure systems.
  • Establish and manage service level objectives (SLOs) and service level agreements (SLAs).
  • Enhance incident response processes and post-mortem analysis for outages.

ClickHouse, recognized on the 2025 Forbes Cloud 100 list, is one of the most innovative and fast-growing private cloud companies. With more than 3,000 customers and ARR that has grown over 250 percent year over year, ClickHouse leads the market in real-time analytics, data warehousing, observability, and AI workloads.

North America

  • Drive root cause investigations
  • Produce professional root cause analysis documentation for customers
  • Ensure the prioritization, planning, and execution of problem resolutions

ServiceNow is a global market leader, bringing innovative AI-enhanced technology to customers. They have over 8,100 customers, including 85% of the Fortune 500®, and their intelligent cloud-based platform connects people, systems, and processes.

  • Maximize the velocity of our product engineering team.
  • Ensure platform scalability, reliability, and security.
  • Champion best practices and shape the engineering culture.

They are building a robust, scalable trading platform to serve high-traffic, latency-sensitive applications. They leverage state-of-the-art technologies to support real-time trading while providing unparalleled reliability and performance.

Canada

  • Set the vision and drive execution for Reliability Engineering.
  • Build software and program management structure to perform continual risk management.
  • Hire and build a global team of SREs, system engineers, and full stack engineers.

Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest. They are a remote-first company that values learning, experimentation, and accountability.

Global

  • Own the end-to-end lifecycle (design, provisioning, upgrades, and decommissioning) of core platform components.
  • Lead the design and implementation of infrastructure bootstrap orchestration, including: Automated cluster and environment provisioning.
  • Apply and promote SRE practices across the platform, including: Clear ownership and runbooks for platform components.

Pismo provides a comprehensive processing platform for banking, card issuing and financial market infrastructure and helps customers innovate and build the next generation of banking and payment solutions. Pismo’s 500+ employees are located in more than 10 countries around the world.

$171,400–$367,200/yr
Global Unlimited PTO

  • Own and drive the architectural direction for critical infrastructure platforms that support GitLab at global scale.

GitLab is the intelligent orchestration platform for DevSecOps. They enable organizations to increase developer productivity, improve operational efficiency, reduce security and compliance risk, and accelerate digital transformation. GitLab has a high-performance culture driven by their values.

South America

  • Own the end‑to‑end lifecycle of core platform components, including cloud infrastructure primitives and Kubernetes clusters.
  • Design platform components to be resilient by default, applying SRE principles like fault isolation and capacity planning.
  • Drive Infrastructure‑as‑Code and GitOps‑first practices to ensure platform components are reproducible and auditable.

Pismo, founded in 2016, provides a comprehensive processing platform for banking, card issuing, and financial market infrastructure, helping customers innovate in banking and payments. With over 500 employees across 10+ countries, Pismo joined Visa in 2024, leveraging Visa’s solutions to advance financial technology.

US

  • Define and execute the reliability engineering roadmap.
  • Establish SLO/SLI/error budget frameworks for system stability.
  • Drive continuous improvement through DORA metrics and analysis.

Jobgether leverages AI for HR solutions. They focus on connecting talent with opportunities, using AI-driven matching to ensure fair and objective application reviews.

US

  • Lead and mentor a team responsible for managing and maintaining the company's IT infrastructure.
  • Collaborate with cross-functional teams to define IT strategies, roadmaps, and solutions aligned with business objectives.
  • Develop and implement IT policies, procedures, and standards to ensure security, availability, and performance of IT systems

AHEAD builds platforms for digital business. By weaving together advances in cloud infrastructure, automation and analytics, and software delivery, they help enterprises deliver on the promise of digital transformation. At AHEAD, they prioritize creating a culture of belonging, where all perspectives and voices are represented, valued, respected, and heard.