Source Job

$140,000–$180,000/yr
Americas Unlimited PTO 16w maternity

  • Build and scale infrastructure to support billions of messages per day and real-time events
  • Automate deployments, alerting, and incident response
  • Tune MySQL and other datastore performance and improve reliability across distributed systems

MySQL GCP Terraform Go Bash

20 jobs similar to Senior Site Reliability Engineer

Jobs ranked by similarity.

$175,000–$195,000/yr
Americas Unlimited PTO 16w maternity

  • Lead effective squad rituals and ensure production readiness.
  • Partner with engineers to ensure solutions are scalable, architecturally sound, flexible, and secure.
  • Provide timely, specific coaching and development opportunities for your direct reports.

Customer.io's platform allows over 8,000 companies to send messages using real-time behavioral data. Their team uses Go, React, Ember, and AI to ship fast and scale with confidence and they value ownership, leadership, and healthy skepticism.

US Unlimited PTO

  • Design, build, and maintain scalable infrastructure and tooling that improves reliability, performance, and availability across OnePay’s platform
  • Contribute to the evolution of our observability stack, platform libraries, cloud architecture, and CI/CD pipelines
  • Develop automation and monitoring systems to detect, prevent, and remediate incidents before they impact customers

OnePay is a consumer fintech company trusted by millions of Americans to make money better, providing an all-in-one financial services platform. Backed by Walmart and Ribbit Capital, OnePay provides banking, savings, credit cards, lending, investing, and crypto services and embedded financial services to frontline workers.

$120,000–$180,000/yr
US

  • Develop automation code to provision and operate infrastructure at scale.
  • Build resilient, scalable, secure, and observable services with cost optimization.
  • Proactively identify and address security concerns across systems and infrastructure.

Globality uses AI to transform enterprise spending into a more efficient and inclusive process. They aim to revolutionize enterprise procurement with AI and have a culture built on trust, collaboration, and innovation, fostering an environment where every individual feels valued and included.

$150,000–$167,000/yr
US

  • Lead reliability-focused design and readiness reviews.
  • Build, operate, and continuously improve our observability stack.
  • Own and evolve incident management practices.

Transcend is building the privacy platform that easily embeds privacy into your entire tech stack. They are growing quickly, backed by top-tier investors and are proud to serve some of the world's most iconic brands.

$160,000–$200,000/yr
US

  • Help drive reliability, automation and performance within our cloud-based infrastructure.
  • Become embedded within an Engineering team helping them navigate production excellence and advocate for best practices.
  • Debug production issues across services and levels of the stack as well as practice incident response and blameless postmortems.

Flywire is a global payments enablement and software company that was founded over a decade ago. They have over 1,200 global FlyMates, representing more than 40 nationalities, in 12 offices worldwide, and are looking for people to join the next stage of their journey as they continue to grow.

Global

  • Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.
  • Ensure reliability of multi-cloud Kubernetes clusters and pipelines.
  • Metrics, logging, analytics, and alerting for performance and security across all endpoints and applications.

Cresta is on a mission to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Their platform combines the best of AI and human intelligence to help contact centers discover customer insights and behavioral best practices.

Nigeria

  • Detect and triage service and reliability issues.
  • Develop automation to eliminate manual and repetitive operational tasks.
  • Investigate and resolve customer complaints escalated beyond L1 and L2 support.

Moniepoint is an all-in-one financial services platform for emerging markets. Since 2019, Moniepoint’s technology has powered over 3 million people, offering personal and business banking, payment, credit and business management tools to help them succeed.

Canada

  • Designing and implementing SLI/SLO frameworks with error budgets to guide reliability and performance decisions.
  • Building and maintaining AWS-based production infrastructure using Infrastructure as Code (Terraform, CloudFormation), including ECS, EKS/Kubernetes, and microservices orchestration.
  • Developing internal tools, automation frameworks, and reliability services in TypeScript, Python, or similar languages to enhance operational efficiency.

Jobgether uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. They identify the top-fitting candidates, and this shortlist is then shared directly with the hiring company.

South America

  • Own the end‑to‑end lifecycle of core platform components, including cloud infrastructure primitives and Kubernetes clusters.
  • Design platform components to be resilient by default, applying SRE principles like fault isolation and capacity planning.
  • Drive Infrastructure‑as‑Code and GitOps‑first practices to ensure platform components are reproducible and auditable.

Pismo, founded in 2016, provides a comprehensive processing platform for banking, card issuing, and financial market infrastructure, helping customers innovate in banking and payments. With over 500 employees across 10+ countries, Pismo joined Visa in 2024, leveraging Visa’s solutions to advance financial technology.

Europe

  • Apply SRE principles to Customer Success and enable monitoring for key customers.
  • Detect and prioritize critical issues affecting the platform's reliability.
  • Proactively identify and implement improvements that enhance platform performance.

Jobgether is a platform that matches job seekers with companies using AI. They aim to ensure applications are reviewed quickly and fairly, connecting top candidates with hiring companies.

  • Maximize the velocity of our product engineering team.
  • Ensure platform scalability, reliability, and security.
  • Champion best practices and shape the engineering culture.

They are building a robust, scalable trading platform to serve high-traffic, latency-sensitive applications. They leverage state-of-the-art technologies to support real-time trading while providing unparalleled reliability and performance.

Mexico

  • Collaborate with engineers in supporting new features and services.
  • Build tools to monitor site stability and performance.
  • Troubleshoot site issues using industry-leading tools like Splunk, Prometheus and OpenTelemetry.

Yelp's engineering culture is cooperative and values individual authenticity. They encourage creative solutions to problems and help users, grow as engineers, and have fun in a collaborative environment.

Europe Middle East Africa

  • Design, deploy and maintain a cloud infrastructure to support a Dataiku SaaS offering mainly on AWS and Azure and GCP
  • Continuously improve the infrastructure, deployment and configuration to deliver more reliable, resilient, scalable and secure services
  • Automate as much as possible all technical operations

Dataiku is The Universal AI Platform™, giving organizations control over their AI talent, processes, and technologies to unleash the creation of analytics, models, and agents. They connect many data science technologies and integrate the best of data and AI tech.

$165,000–$200,000/yr
US Unlimited PTO

  • Contribute to building and operating the infrastructure that supports the HackerOne platform.
  • Improve the reliability, security, and scalability of our systems.
  • Design and operate highly available cloud systems and apply best practices for reliability, observability, and security.

HackerOne is a global leader in Continuous Threat Exposure Management (CTEM). The HackerOne Platform unites agentic AI solutions with the ingenuity of the world’s largest community of security researchers to continuously discover, validate, prioritize, and remediate exposures across code, cloud, and AI systems. They combine the ingenuity of the largest security research community with a best-in-class AI-powered platform, trusted by the world’s top organizations.

Global

  • Own the end-to-end lifecycle (design, provisioning, upgrades, and decommissioning) of core platform components.
  • Lead the design and implementation of infrastructure bootstrap orchestration, including: Automated cluster and environment provisioning.
  • Apply and promote SRE practices across the platform, including: Clear ownership and runbooks for platform components.

Pismo provides a comprehensive processing platform for banking, card issuing and financial market infrastructure and helps customers innovate and build the next generation of banking and payment solutions. Pismo’s 500+ employees are located in more than 10 countries around the world.

US

  • Design, build, and maintain our core cloud infrastructure on AWS/GCP using Infrastructure as Code.
  • Manage and scale our mission-critical services on Kubernetes, ensuring high availability and resilience.
  • Enhance and operate our CI/CD systems and developer tools within a GitLab-based workflow.

Mambu is a leading SaaS cloud banking platform that is on a mission to make banking better for a billion people. They empower customers to build innovative and secure financial products, and power billions of transactions for millions of end-users.

India 3w PTO

  • Own and architect high-availability MySQL database platforms supporting critical business systems.
  • Lead incident response for critical database outages, coordinate cross-functional teams, and drive post-incident reviews.
  • Lead performance engineering initiatives including query optimization, indexing strategies, and schema design reviews.

LivePerson is a leader in trusted enterprise conversational AI and digital transformation. They connect the world's leading brands with millions of consumers, powering nearly a billion conversational interactions every month and are recognized as a top innovative company.

$146,200–$212,000/yr
US Unlimited PTO

  • Collaborate with service engineering teams to design, implement, and maintain scalable and resilient infrastructure solutions.
  • Implement SRE principles to improve system reliability and reduce downtime.
  • Improve developer workflows by creating self-service tools, optimizing CI/CD pipelines, and enhancing deployment processes.

Flex is a growth-stage FinTech company creating the best rent payment experience. They empower renters with flexibility over their most significant recurring expense and are growing quickly with a focus on building an inclusive culture.

$205,000–$270,000/yr
US Unlimited PTO

  • Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.
  • Ensure reliability of multi-cloud Kubernetes clusters and pipelines.
  • Focus on automation so we can spend energy where it matters.

Cresta is on a mission to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Their platform combines the best of AI and human intelligence to help contact centers discover customer insights and behavioral best practices.

$146,000–$178,000/yr
US Unlimited PTO

  • Design, build, and maintain systems for declarative application and infrastructure lifecycle management.
  • Automate infrastructure provisioning and application deployments using infrastructure-as-code (IaC) tools and deployment patterns.
  • Implement monitoring and observability solutions to proactively identify and resolve performance bottlenecks.

Clover Health is reinventing health insurance by combining data and human empathy to keep members healthier. They've created custom software and analytics to empower their clinical staff to provide personalized care to those who need it most; diversity and inclusion are key to their success.