Source Job

Unlimited PTO

  • Build and operate cutting-edge cloud infrastructure to support Diagrid's core products
  • Define standards, deliver tools, processes, and frameworks to make our products secure, reliable, efficient, and highly available
  • Build and maintain CI/CD pipelines that enable delivering software quickly and securely across clouds

Kubernetes Terraform Kafka Python Go

20 jobs similar to Senior Site Reliability Engineer

Jobs ranked by similarity.

Global

  • Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.
  • Ensure reliability of multi-cloud Kubernetes clusters and pipelines.
  • Metrics, logging, analytics, and alerting for performance and security across all endpoints and applications.

Cresta is on a mission to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Their platform combines the best of AI and human intelligence to help contact centers discover customer insights and behavioral best practices.

$205,000–$270,000/yr
US Unlimited PTO

  • Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.
  • Ensure reliability of multi-cloud Kubernetes clusters and pipelines.
  • Focus on automation so we can spend energy where it matters.

Cresta is on a mission to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Their platform combines the best of AI and human intelligence to help contact centers discover customer insights and behavioral best practices.

Europe Middle East Africa

  • Design, deploy and maintain a cloud infrastructure to support a Dataiku SaaS offering mainly on AWS and Azure and GCP
  • Continuously improve the infrastructure, deployment and configuration to deliver more reliable, resilient, scalable and secure services
  • Automate as much as possible all technical operations

Dataiku is The Universal AI Platform™, giving organizations control over their AI talent, processes, and technologies to unleash the creation of analytics, models, and agents. They connect many data science technologies and integrate the best of data and AI tech.

$4,000–$5,000/mo
Latin America

  • Design and evolve production environments, define standards and best practices.
  • Partner with engineering and IT teams to build scalable, reliable systems.
  • Lead incident response practices, and set guardrails around security, reliability, and cost management.

They are looking for a Senior Site Reliability Engineer who can own the architecture, governance, and cost efficiency of their cloud and platform infrastructure. This role is a remote contractor role and they are seeking candidates located in LATAM.

US

  • Design, build, and maintain our core cloud infrastructure on AWS/GCP using Infrastructure as Code.
  • Manage and scale our mission-critical services on Kubernetes, ensuring high availability and resilience.
  • Enhance and operate our CI/CD systems and developer tools within a GitLab-based workflow.

Mambu is a leading SaaS cloud banking platform that is on a mission to make banking better for a billion people. They empower customers to build innovative and secure financial products, and power billions of transactions for millions of end-users.

$150,000–$167,000/yr
US

  • Lead reliability-focused design and readiness reviews.
  • Build, operate, and continuously improve our observability stack.
  • Own and evolve incident management practices.

Transcend is building the privacy platform that easily embeds privacy into your entire tech stack. They are growing quickly, backed by top-tier investors and are proud to serve some of the world's most iconic brands.

Global

  • Own the end-to-end lifecycle (design, provisioning, upgrades, and decommissioning) of core platform components.
  • Lead the design and implementation of infrastructure bootstrap orchestration, including: Automated cluster and environment provisioning.
  • Apply and promote SRE practices across the platform, including: Clear ownership and runbooks for platform components.

Pismo provides a comprehensive processing platform for banking, card issuing and financial market infrastructure and helps customers innovate and build the next generation of banking and payment solutions. Pismo’s 500+ employees are located in more than 10 countries around the world.

South America

  • Own the end‑to‑end lifecycle of core platform components, including cloud infrastructure primitives and Kubernetes clusters.
  • Design platform components to be resilient by default, applying SRE principles like fault isolation and capacity planning.
  • Drive Infrastructure‑as‑Code and GitOps‑first practices to ensure platform components are reproducible and auditable.

Pismo, founded in 2016, provides a comprehensive processing platform for banking, card issuing, and financial market infrastructure, helping customers innovate in banking and payments. With over 500 employees across 10+ countries, Pismo joined Visa in 2024, leveraging Visa’s solutions to advance financial technology.

  • Maximize the velocity of our product engineering team.
  • Ensure platform scalability, reliability, and security.
  • Champion best practices and shape the engineering culture.

They are building a robust, scalable trading platform to serve high-traffic, latency-sensitive applications. They leverage state-of-the-art technologies to support real-time trading while providing unparalleled reliability and performance.

$165,000–$200,000/yr
US Unlimited PTO

  • Contribute to building and operating the infrastructure that supports the HackerOne platform.
  • Improve the reliability, security, and scalability of our systems.
  • Design and operate highly available cloud systems and apply best practices for reliability, observability, and security.

HackerOne is a global leader in Continuous Threat Exposure Management (CTEM). The HackerOne Platform unites agentic AI solutions with the ingenuity of the world’s largest community of security researchers to continuously discover, validate, prioritize, and remediate exposures across code, cloud, and AI systems. They combine the ingenuity of the largest security research community with a best-in-class AI-powered platform, trusted by the world’s top organizations.

US

  • Design and maintain scalable cloud environments using tools like Terraform, CloudFormation, or Ansible.
  • Build and optimize automated deployment pipelines to ensure rapid and reliable software delivery.
  • Implement robust monitoring, logging, and alerting frameworks to ensure 24/7 system health.

CodeRoad offers end-to-end software development services, helping businesses scale with infrastructure solutions. They provide staff augmentation, dedicated IT teams, and software engineering to empower businesses in a digital landscape.

US Unlimited PTO

  • Design, build, and maintain scalable infrastructure and tooling that improves reliability, performance, and availability across OnePay’s platform
  • Contribute to the evolution of our observability stack, platform libraries, cloud architecture, and CI/CD pipelines
  • Develop automation and monitoring systems to detect, prevent, and remediate incidents before they impact customers

OnePay is a consumer fintech company trusted by millions of Americans to make money better, providing an all-in-one financial services platform. Backed by Walmart and Ribbit Capital, OnePay provides banking, savings, credit cards, lending, investing, and crypto services and embedded financial services to frontline workers.

US

  • Own and scale AWS and Kubernetes infrastructure.
  • Build and maintain CI/CD pipelines and infrastructure-as-code.
  • Lead observability and monitoring initiatives.

Truelogic is a nearshore staff augmentation services provider headquartered in New York. They deliver technology solutions to companies of all sizes, helping them achieve their digital transformation goals with a team of 600+ highly skilled tech professionals based in Latin America.

$160,000–$200,000/yr
US

  • Help drive reliability, automation and performance within our cloud-based infrastructure.
  • Become embedded within an Engineering team helping them navigate production excellence and advocate for best practices.
  • Debug production issues across services and levels of the stack as well as practice incident response and blameless postmortems.

Flywire is a global payments enablement and software company that was founded over a decade ago. They have over 1,200 global FlyMates, representing more than 40 nationalities, in 12 offices worldwide, and are looking for people to join the next stage of their journey as they continue to grow.

$146,200–$212,000/yr
US Unlimited PTO

  • Collaborate with service engineering teams to design, implement, and maintain scalable and resilient infrastructure solutions.
  • Implement SRE principles to improve system reliability and reduce downtime.
  • Improve developer workflows by creating self-service tools, optimizing CI/CD pipelines, and enhancing deployment processes.

Flex is a growth-stage FinTech company creating the best rent payment experience. They empower renters with flexibility over their most significant recurring expense and are growing quickly with a focus on building an inclusive culture.

Europe

  • Implement SLI/SLO frameworks with error budgets to drive reliability decisions
  • Design release strategies including blue/green deployments and version tracking
  • Lead incident response and develop automated runbooks to reduce MTTR

Jobgether is a company that helps connect individuals with jobs through an AI-powered matching process. They ensure applications are reviewed quickly, objectively, and fairly against roles' core requirements.

Hungary

  • Build, operate, and continuously evolve ultra-resilient, cloud-native platforms utilizing Kubernetes, Docker, and advanced container orchestration.
  • Eliminate manual toil by engineering heavily automated infrastructure using robust Infrastructure as Code (IaC) tools like Terraform and Ansible.
  • Implement and operate cutting-edge CI/CD pipelines optimized for the rapid, secure deployment of mission-critical software, APIs, and AI/ML models.

Deutsche Telekom IT Solutions is a subsidiary of the Deutsche Telekom Group and was Hungary’s most attractive employer in 2025, according to Randstad’s representative survey. The company provides a wide portfolio of IT and telecommunications services with more than 5300 employees.

Canada 6w PTO

  • Provide and own automation of the provisioning of CSP resources, including networking, Kubernetes clusters and specific CSP resources required by our application teams.
  • Work with users (Grafana Cloud application teams) to help understand their needs and ensure investment in the right capabilities.
  • Participate in the Platform department Infrastructure wing on-call rotation.

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana around the globe. The team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything that they do.

$118,000–$158,000/yr
US

  • Monitor and troubleshoot system performance, reliability, and availability issues using modern observability tools and techniques.
  • Design, implement, and maintain scalable and reliable infrastructure using containers, Kubernetes, and microservices architecture.
  • Manage CI/CD pipelines to facilitate efficient software development and deployment processes.

Ooma empowers people to connect in smarter ways by creating powerful communication experiences through our cloud-based platform to bring people together at work and at home. Ooma's solutions help small business owners stay connected with their customers and manage their businesses from anywhere.

$150,000–$185,000/yr
US

  • Partner with Sales to identify business opportunities aligned with our platform.
  • Collaborate with customers to understand their CI/CD architecture and workflows.
  • Advise customers how to migrate from their existing solutions to Buildkite’s uniquely differentiated platform.

Buildkite's mission is to unleash the potential of every developer on the planet. They provide a fast, reliable, and secure platform used by software engineering teams, including Canva, Slack, Uber, and Airbnb. They value kindness, autonomy, and collaboration and foster an inclusive and innovative culture.