Source Job

Europe

  • Design and maintain scalable, fault-tolerant infrastructure that supports our SaaS platform and keeps pace with business growth.
  • Define, document, and maintain SLIs, SLOs, and SLAs in partnership with product engineering, translating business commitments into technical guardrails.
  • Lead incident response with steady judgment, facilitate blameless postmortems, and drive remediation efforts that prevent recurrence.

AWS Terraform Pulumi TypeScript Python

20 jobs similar to Senior Site Reliability Engineer

Jobs ranked by similarity.

Canada

  • Designing and implementing SLI/SLO frameworks with error budgets to guide reliability and performance decisions.
  • Building and maintaining AWS-based production infrastructure using Infrastructure as Code (Terraform, CloudFormation), including ECS, EKS/Kubernetes, and microservices orchestration.
  • Developing internal tools, automation frameworks, and reliability services in TypeScript, Python, or similar languages to enhance operational efficiency.

Jobgether uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. They identify the top-fitting candidates, and this shortlist is then shared directly with the hiring company.

$150,000–$167,000/yr
US

  • Lead reliability-focused design and readiness reviews.
  • Build, operate, and continuously improve our observability stack.
  • Own and evolve incident management practices.

Transcend is building the privacy platform that easily embeds privacy into your entire tech stack. They are growing quickly, backed by top-tier investors and are proud to serve some of the world's most iconic brands.

  • Maximize the velocity of our product engineering team.
  • Ensure platform scalability, reliability, and security.
  • Champion best practices and shape the engineering culture.

They are building a robust, scalable trading platform to serve high-traffic, latency-sensitive applications. They leverage state-of-the-art technologies to support real-time trading while providing unparalleled reliability and performance.

Europe

  • Own the reliability, scalability, and performance of Peec AI’s core systems and infrastructure
  • Design, build, and maintain the tooling, automation, and monitoring that keep our services fast, secure, and highly available
  • Partner closely with product and engineering teams to ensure new features are reliable, observable, and easy to operate from day one

Peec AI is one of Europe’s fastest-growing Series A startups (no employee count/culture details given). They provide exciting and challenging work in the AI space.

Europe

  • Implement SLI/SLO frameworks with error budgets to drive reliability decisions
  • Design release strategies including blue/green deployments and version tracking
  • Lead incident response and develop automated runbooks to reduce MTTR

Jobgether is a company that helps connect individuals with jobs through an AI-powered matching process. They ensure applications are reviewed quickly, objectively, and fairly against roles' core requirements.

US Canada

  • Maintain tooling, libraries, and infrastructure leveraged by core service teams
  • Develop and maintain infrastructure services that enable engineers to manage, deploy, and scale systems
  • Act as a technical leader, guiding core service teams to design robust and reliable software

StackAdapt is a technology company that empowers marketers to reach, engage, and convert audiences with precision. They are an AI-powered platform connecting brand and performance marketing, recognized for their diverse workplace and high-performing campaigns.

Global Unlimited PTO

  • Build and maintain Infrastructure as Code to power our production systems, Python tools to automate toil, and monitoring systems to detect problems early.
  • Independently execute on large DevOps projects such as major migrations, product rollouts, and infrastructure enhancements
  • Participate in the infrastructure on-call rotation & incident response process, including triaging alerts, coordinating responders, and contributing to blame-free RCAs. Leverage senior level expertise to drive rapid resolutions.

Super.com aims to maximize the lives of both customers and employees, providing opportunities to unlock potential through learning and impact. They are a fast-paced, high-growth tech company that values career progression and supports employees through various programs.

$120,000–$150,000/yr
US

  • Design, build, and maintain automated CI/CD pipelines to enable fast, secure, and reliable deployments.
  • Provision, manage, and optimize core AWS services to support scalable, highly available applications.
  • Implement and maintain IaC frameworks to ensure infrastructure is version-controlled, repeatable, and auditable.

Arine is a healthcare technology and clinical services company dedicated to ensuring individuals receive the safest and most effective treatment. They are backed by leading healthcare investors and collaborate with top healthcare organizations, managing more than 18 million lives across prominent health plans.

$80,547–$106,026/yr
North America

  • Develop and maintain resilient, cost-efficient infrastructure using AWS and other cloud services to meet evolving business needs.
  • Use IaC solutions to enable automated provisioning and ensure consistency across all environments.
  • Design, develop, and maintain advanced pipelines, ensuring automated testing integration and deployment efficiency at scale.

Pagefreezer's vision is to make the Internet a safer place by delivering solutions that transform how people protect integrity online, ensuring accountability, and enabling the pursuit of justice. They simplify compliance and litigation by automatically archiving websites, social media, mobile text messages, and enterprise collaboration platforms. It appears they have a good company culture as they have been named Canada’s Most Admired Culture 2023, 2024 and 2025, one of BC’s Top Employers 2024 and as one of Canada’s Top Small & Medium Employers for 2024.

US

  • Own and scale AWS and Kubernetes infrastructure.
  • Build and maintain CI/CD pipelines and infrastructure-as-code.
  • Lead observability and monitoring initiatives.

Truelogic is a nearshore staff augmentation services provider headquartered in New York. They deliver technology solutions to companies of all sizes, helping them achieve their digital transformation goals with a team of 600+ highly skilled tech professionals based in Latin America.

  • Design, develop, and implement platform solutions that enhance the reliability, security, and scalability of the Database Platform infrastructure.
  • Provide technical leadership in AWS cloud infrastructure, networking, CI/CD, and security for cloud infrastructure solutions.
  • Mentor and coach team members, fostering a culture of knowledge sharing, technical excellence, and continuous improvement.

SYSTABUILD is building a shared cloud and platform foundation for a group of leading software companies in the construction, CAD and ERP domain. They are looking for a Lead Cloud Infrastructure Engineer to take a key role in designing, operating, and evolving their central cloud infrastructure and platform services.

Global

  • Design, build, and maintain scalable backend services primarily using Python
  • Develop and operate cloud-native systems on AWS, ensuring reliability, security, and performance
  • Contribute to infrastructure design and automation using Terraform

Smart Working connects skilled professionals with global teams for full-time, long-term roles, breaking down geographic barriers. They value growth and well-being, fostering a genuine community and empowering individuals to thrive in a remote-first world.

Global

  • Design and implement comprehensive monitoring strategies.
  • Take ownership of production incident response, lead handling, and drive remediation.
  • Continuously improve operational processes, reliability practices, and team readiness.

InvestorFlow delivers industry specialized CRM and digital portals to help alternative asset firms find opportunities, create and manage relationships, and turn relationship insights into action. They serve over 175 clients, including 25 of the top 50 alternative asset managers, managing more than $6 trillion in assets.

US

  • Design, build, and maintain our core cloud infrastructure on AWS/GCP using Infrastructure as Code.
  • Manage and scale our mission-critical services on Kubernetes, ensuring high availability and resilience.
  • Enhance and operate our CI/CD systems and developer tools within a GitLab-based workflow.

Mambu is a leading SaaS cloud banking platform that is on a mission to make banking better for a billion people. They empower customers to build innovative and secure financial products, and power billions of transactions for millions of end-users.

Europe

  • Maintaining and updating Glia’s core infrastructure.
  • Troubleshooting and resolving infrastructure-related issues.
  • Improving our security posture.

Glia provides AI customer service solutions for banks and credit unions, unifying AI and human agents across all conversations via their ChannelLess® Architecture. They are valued at over $1 billion, have been named a Deloitte Technology Fast 500™ company for five years, powers over 700 financial institutions and maintains an industry-leading 72 NPS.

Europe

  • Developing infrastructure to support cloud-based applications.
  • Creating deployment architect and continuous delivery pipelines.
  • Designing high-availability approaches, and implementing monitoring architecture.

Nearform is a digital and AI engineering consultancy with a reputation for experience-led modernization. They focus on creating transformative digital products for enterprise customers across the UK and Ireland. Nearformers form a close-knit community built on trust and camaraderie.

Europe

  • Manage cloud infrastructure and optimize costs, particularly in AWS environments using Terraform and Python.
  • Design, develop, and maintain CI/CD pipelines and infrastructure for AI model training and deployment.
  • Ensure platform scalability and efficient resource utilization.

NEORIS, now part of EPAM Systems, is a Digital Accelerator that helps companies step into the future. With more than 20 years of experience as Digital Partners to some of the world’s leading organizations, they are over 4,000 professionals across 11 countries and foster a multicultural, startup-minded culture that promotes innovation, continuous learning, and the delivery of high-impact solutions for their clients.

US Unlimited PTO

  • Design, build, and maintain scalable infrastructure and tooling that improves reliability, performance, and availability across OnePay’s platform
  • Contribute to the evolution of our observability stack, platform libraries, cloud architecture, and CI/CD pipelines
  • Develop automation and monitoring systems to detect, prevent, and remediate incidents before they impact customers

OnePay is a consumer fintech company trusted by millions of Americans to make money better, providing an all-in-one financial services platform. Backed by Walmart and Ribbit Capital, OnePay provides banking, savings, credit cards, lending, investing, and crypto services and embedded financial services to frontline workers.

South America

  • Own the end‑to‑end lifecycle of core platform components, including cloud infrastructure primitives and Kubernetes clusters.
  • Design platform components to be resilient by default, applying SRE principles like fault isolation and capacity planning.
  • Drive Infrastructure‑as‑Code and GitOps‑first practices to ensure platform components are reproducible and auditable.

Pismo, founded in 2016, provides a comprehensive processing platform for banking, card issuing, and financial market infrastructure, helping customers innovate in banking and payments. With over 500 employees across 10+ countries, Pismo joined Visa in 2024, leveraging Visa’s solutions to advance financial technology.

$165,000–$200,000/yr
US Unlimited PTO

  • Contribute to building and operating the infrastructure that supports the HackerOne platform.
  • Improve the reliability, security, and scalability of our systems.
  • Design and operate highly available cloud systems and apply best practices for reliability, observability, and security.

HackerOne is a global leader in Continuous Threat Exposure Management (CTEM). The HackerOne Platform unites agentic AI solutions with the ingenuity of the world’s largest community of security researchers to continuously discover, validate, prioritize, and remediate exposures across code, cloud, and AI systems. They combine the ingenuity of the largest security research community with a best-in-class AI-powered platform, trusted by the world’s top organizations.