Source Job

$198,025–$287,952/yr

  • Building tools and applications to extends Calendly’s infrastructure platform
  • Evaluating and deploying cloud native open source tools
  • Exercising expertise in cloud infrastructure concepts and patterns

GCP Golang Python Kubernetes Datadog

20 jobs similar to Senior Site Reliability Engineer

Jobs ranked by similarity.

$141,000–$230,000/yr
US

  • Collaborate with engineering teams to design and implement scalable, secure systems.
  • Establish and manage service level objectives (SLOs) and service level agreements (SLAs).
  • Enhance incident response processes and post-mortem analysis for outages.

ClickHouse, recognized on the 2025 Forbes Cloud 100 list, is one of the most innovative and fast-growing private cloud companies. With more than 3,000 customers and ARR that has grown over 250 percent year over year, ClickHouse leads the market in real-time analytics, data warehousing, observability, and AI workloads.

$230,000–$250,000/yr
US Unlimited PTO 12w paternity

  • Define and evolve reliability standards for the SmarterDx platform.
  • Enhance observability systems (metrics, logs, traces, alerting) to provide actionable insights and reduce mean time to detect (MTTD) and resolve (MTTR).
  • Reduce operational toil through automation, self-healing systems, and improved deployment and rollback mechanisms.

SmarterDx, a Smarter Technologies company, builds clinical AI that is transforming how hospitals translate care into payment. Founded by physicians in 2020, their platform connects clinical context with revenue intelligence, helping health systems recover millions in missed revenue, improve quality scores, and appeal every denial.

Europe 5w PTO

  • Work with other Engineering teams to design sustainable infrastructure and microservice solutions.
  • Automate tools and infrastructure to reduce manual work.
  • Monitor applications and participate in an on-call rotation as required.

Bloomreach is building the world’s premier agentic platform for personalization, revolutionizing how businesses connect with their customers by building and deploying AI agents to personalize the entire customer journey. They power personalization for more than 1,400 global brands.

Canada

  • Implementing the improvements to the reliability, fault tolerance, scalability, and performance of our infrastructure
  • Managing incidents using your technical know-how to involve the appropriate teams and automate away manual practices
  • Improving observability across our systems (metrics, logs, tracing) to reduce time to detection and resolution

Newton is changing how Canadians trade crypto with the goal to make financial freedom achievable for everyone by giving their customers the tools and knowledge needed to navigate the crypto world. They are a remote team spread across Canada that values pushing boundaries and getting things done.

US Canada

  • Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement.
  • Participate in an on-call rotation and act as incident commander for high-severity production events.
  • Partner with engineering teams to build reliability into new features before they ship to production

Akuity helps enterprises ship software faster and more reliably with modern GitOps best practices. The Akuity Platform enables teams to manage the development and deployment across hundreds – if not thousands – of Kubernetes clusters from a single control plane.

$172,614–$172,614/yr
US

  • Design infrastructure, networking, and software platform architecture.
  • Build and maintain automation of Continuous Integration and Continuous Deployment pipelines.
  • Troubleshoot infrastructure, internal applications, networking, and security issues.

Loadsmart is a technology company focused on the logistics and supply chain industry. They leverage data and technology to automate and optimize freight transportation, connecting shippers and carriers to streamline the shipping process. They are a mid-sized company passionate about transforming the future of freight.

Europe

  • Write code, automate everything, design for reliability, and deeply understand the systems.
  • Build or extend Terraform modules and contribute to Platform Engineering around Observability.
  • Collaborate with developers to shape feature design so that reliability is built in, not added later.

InPost Group is an innovative European out of home deliveries company, revolutionizing the way parcels are delivered to customers. With over 10,000 employees worldwide, InPost Group is one of the largest out of home delivery providers in Europe, committed to providing sustainable and efficient delivery solutions.

Global

  • Build and own the foundational infrastructure that our products run upon.
  • Work directly on our products' golang code base to implement SRE related objectives.
  • Take a data driven approach to quantifying system performance and reliability.

LiveKit provides the network infrastructure for multimodal AI interfaces, enabling seamless audio and visual interactions. Founded in 2021, LiveKit supports over 3 Billion calls annually, with 100,000+ developers and industry giants like OpenAI, Spotify, and Meta.

US Canada 16w maternity

  • Build and deploy computing services and infrastructure in customer environments.
  • Clarify and surface requirements from ambiguous use cases defined by cross-functional stakeholders.
  • Improve reliability and scalability by resolving edge cases, studying failure modes, and writing tests.

Planet designs, builds, and operates the largest constellation of imaging satellites in history. They deliver an unprecedented dataset of empirical information via a revolutionary cloud-based platform to authoritative figures in commercial, environmental, and humanitarian sectors. Planet has a people-centric approach toward culture and community and it strives to iterate in a way that puts their team members first and prepares their company for growth.

$160,000–$180,000/yr
US

  • Responsible for availability, latency, performance, efficiency, monitoring/observability, emergency response, capacity planning.
  • Analyze, troubleshoot and resolve operational challenges contributing to defined SLO's.
  • Manage site stability, performance, reliability, and maintain uptime for production environments.

CentralReach provides autism and IDD care software for Applied Behavior Analysis (ABA), multidisciplinary therapy, and special education. They are trusted by more than 200,000 users and is backed by Roper Technologies, Inc. (Nasdaq: ROP). Their culture is centered around impact, inclusion, and flexibility.

  • Designing, building, and operating Kubernetes infrastructure across multiple cloud providers.
  • Building and maintaining automation for cluster lifecycle management, node provisioning, and provider onboarding.
  • Developing platform tooling and abstractions that enable other Canva engineers to deploy and scale workloads.

Canva is a design platform redefining how the world experiences design. They have campuses in Sydney and Melbourne, along with co-working spaces in Brisbane, Perth and Adelaide, offering a flexible and inclusive work environment.

$133,110–$148,042/yr
US

  • Collaborate with stakeholders to drive best practices for monitoring, CI/CD pipelines
  • Troubleshoot deployment issues in our CI pipeline
  • Identify areas for automation and embrace the codification of all things

Weedmaps is a global leader in the cannabis industry. They are dedicated to transparency, education, and community, serving cannabis to consumers and businesses in the U.S. and worldwide.

Europe

  • Lead the Infrastructure Engineering team, taking full ownership of cloud infrastructure, Kubernetes platforms, DevOps tooling, and CI/CD pipelines.
  • Drive reliability, scalability, and security across the production environment while maintaining a sharp focus on developer velocity and business impact.
  • Mentor and guide engineers across SRE, DevOps, and Database Reliability functions, fostering a culture of operational excellence and pragmatic problem-solving.

Finom is a European tech startup headquartered in Amsterdam, revolutionizing financial services for entrepreneurs with an all-in-one B2B platform. They have raised $346 million, are expanding across key EU markets, and foster innovation, prioritizing research and solutions that benefit users, employees, partners, and the business.

Americas

  • Manage and support infrastructure for Growth teams, including Nomad, Hashistack, databases, and any other underlying systems
  • Maintain and troubleshoot GitLab CI pipelines, ensuring reliable and fast build, test, and deployment cycles
  • Provide operational support across Onboarding, Acquire, and Engage teams, helping debug issues in staging and production environments

Kraken is a mission-focused company rooted in crypto values, aiming to accelerate the global adoption of crypto, so that everyone can achieve financial freedom and inclusion. As a fully remote company, they have Krakenites in 70+ countries who speak over 50 languages.

Canada

  • Working with engineers across Yelp in supporting new features and services.
  • Integrating tools to monitor platform stability and performance.
  • Help scale our Kubernetes clusters and AWS-based infrastructure while maintaining our platform's SLOs.

Yelp's engineering culture values individual authenticity and encourages creative solutions. They focus on helping users, growing as engineers, and having fun in a collaborative environment.

South Africa

  • Ensure reliability, uptime, and performance across GCP environments.
  • Implement SRE and DevOps best practices with strong focus on automation and scalability.
  • Build and optimize CI/CD pipelines using GCP-native tools.

InspiredXpert is a specialist IT Talent Solutions company providing high-quality contract or perm talent across software development, cloud, AI, cybersecurity, and data-driven roles. We connect skilled professionals with innovative companies, offering exciting opportunities to work on impactful projects across the globe.

$134,000–$149,000/yr
US

  • Design, implement, and operate cloud-native infrastructure for production workloads.

PointClickCare's mission is to help providers deliver exceptional care. They are a leading health tech company that’s founder-led and privately held that empowers their employees to push boundaries, innovate, and shape the future of healthcare. They have the largest long-term and post-acute care dataset and a Marketplace of 400+ integrated partners, their platform serves over 30,000 provider organizations.

US

  • Collaborate with application engineering teams on platform infrastructure.
  • Enhance observability and spearhead the adoption of SRE best practices.
  • Build and maintain reliable CI/CD pipelines, tooling, and infrastructure.

Rula strives to provide quality, evidence-based, compassionate mental healthcare and aims to create a world where mental health is no longer stigmatized. They are a remote-first company operating in most U.S. states, and are dedicated to having a culture of inclusion that supports their employees.

  • Build and manage GCP infrastructure across core services.
  • Support and execute migrations from on-premises or multi-cloud environments into GCP.
  • Implement and maintain infrastructure using Terraform for repeatable deployments.

Ontrac Solutions is a technology consulting firm that specializes in cutting-edge solutions driving business transformation. They partner with organizations to modernize infrastructure, streamline processes, and deliver results through innovation, collaboration and excellence.

Europe

  • Design and implement a cloud-native platform architecture on GCP
  • Build scalable guardrails for multi-team, multi-environment setups with compliance requirements
  • Create reusable infrastructure that enables self-service provisioning

InPost Group is an innovative European out of home deliveries company, revolutionizing the way parcels are delivered to customers. With over 10,000 employees worldwide, they are committed to providing sustainable and efficient delivery solutions to meet the evolving needs of customers.