Source Job

$110,000–$175,000/yr
US

  • Become a subject matter expert in applications supporting Ooma customers.
  • Collaborate with Development, QA and other SREs to evaluate, deploy, and debug applications.
  • Improve observability by implementing, refining, and adjusting application monitoring and thresholds.

Linux Python Bash DevOps

20 jobs similar to Site Reliability Engineer

Jobs ranked by similarity.

$172,614–$172,614/yr
US

  • Design infrastructure, networking, and software platform architecture.
  • Build and maintain automation of Continuous Integration and Continuous Deployment pipelines.
  • Troubleshoot infrastructure, internal applications, networking, and security issues.

Loadsmart is a technology company focused on the logistics and supply chain industry. They leverage data and technology to automate and optimize freight transportation, connecting shippers and carriers to streamline the shipping process. They are a mid-sized company passionate about transforming the future of freight.

$141,000–$230,000/yr
US

  • Collaborate with engineering teams to design and implement scalable, secure systems.
  • Establish and manage service level objectives (SLOs) and service level agreements (SLAs).
  • Enhance incident response processes and post-mortem analysis for outages.

ClickHouse, recognized on the 2025 Forbes Cloud 100 list, is one of the most innovative and fast-growing private cloud companies. With more than 3,000 customers and ARR that has grown over 250 percent year over year, ClickHouse leads the market in real-time analytics, data warehousing, observability, and AI workloads.

US

  • Deploy, manage, and secure Ivanti’s production Software-as-a-Service (SaaS) environments in AWS and Azure
  • Automate common and repetitive tasks
  • Participate in on-call rotations for 24x7 coverage (follow-the-sun model) for incident response, issue triage, and problem resolution

Ivanti's mission is to elevate human potential within organizations by managing, protecting and automating technology for continuous innovation. They are committed to building a diverse team and fostering an inclusive environment where everyone belongs.

US

  • Responsible for availability, latency, performance, efficiency, monitoring/observability, emergency response, capacity planning.
  • Analyze, troubleshoot and resolve operational challenges contributing to defined SLO's.
  • Manage site stability, performance, reliability, and maintain uptime for production environments.

CentralReach provides autism and IDD care software for Applied Behavior Analysis (ABA), multidisciplinary therapy, and special education. They are trusted by more than 200,000 users and is backed by Roper Technologies, Inc. (Nasdaq: ROP). Their culture is centered around impact, inclusion, and flexibility.

Nigeria

  • Detect and triage service and reliability issues.
  • Develop automation to eliminate manual and repetitive operational tasks.
  • Investigate and resolve customer complaints escalated beyond L1 and L2 support.

Moniepoint is an all-in-one financial services platform for emerging markets. Since 2019, Moniepoint’s technology has powered over 3 million people, offering personal and business banking, payment, credit and business management tools to help them succeed.

US Unlimited PTO

  • Design, build, and maintain scalable infrastructure and tooling that improves reliability, performance, and availability across OnePay’s platform
  • Contribute to the evolution of our observability stack, platform libraries, cloud architecture, and CI/CD pipelines
  • Develop automation and monitoring systems to detect, prevent, and remediate incidents before they impact customers

OnePay is a consumer fintech company trusted by millions of Americans to make money better, providing an all-in-one financial services platform. Backed by Walmart and Ribbit Capital, OnePay provides banking, savings, credit cards, lending, investing, and crypto services and embedded financial services to frontline workers.

US

  • Collaborate with application engineering teams on platform infrastructure.
  • Enhance observability and spearhead the adoption of SRE best practices.
  • Build and maintain reliable CI/CD pipelines, tooling, and infrastructure.

Rula strives to provide quality, evidence-based, compassionate mental healthcare and aims to create a world where mental health is no longer stigmatized. They are a remote-first company operating in most U.S. states, and are dedicated to having a culture of inclusion that supports their employees.

$120,000–$180,000/yr
US

  • Develop automation code to provision and operate infrastructure at scale.
  • Build resilient, scalable, secure, and observable services with cost optimization.
  • Proactively identify and address security concerns across systems and infrastructure.

Globality uses AI to transform enterprise spending into a more efficient and inclusive process. They aim to revolutionize enterprise procurement with AI and have a culture built on trust, collaboration, and innovation, fostering an environment where every individual feels valued and included.

4w PTO

  • Work closely with developers for prototyping, and designing new features as part of the infrastructure.
  • Deploy, install, configure and maintain sophisticated Trading/Finance and related software.
  • Build & maintain CI/CD pipelines.

Devexperts works with respected financial institutions, delivering products and tailor-made solutions for retail and brokerage houses, exchanges, and buy-side firms. The company focuses on trading platforms and brokerage automation, complex software development projects, market data products, and IT consulting services.

$90,000–$125,000/yr
US 3w PTO

  • Support Engineering and Platform automation efforts with development and scripting skills.
  • Automate operational processes using scripting languages.
  • Develop, implement, and continually improve system and network monitoring and alerting capabilities and procedures.

Cotiviti is focused on providing payment accuracy and analytics-driven solutions that drive measurable results. They offer team members a competitive benefits package and has a culture of valuing individual qualifications without regard to race, gender, or other protected characteristics.

$160,000–$200,000/yr
US

  • Help drive reliability, automation and performance within our cloud-based infrastructure.
  • Become embedded within an Engineering team helping them navigate production excellence and advocate for best practices.
  • Debug production issues across services and levels of the stack as well as practice incident response and blameless postmortems.

Flywire is a global payments enablement and software company that was founded over a decade ago. They have over 1,200 global FlyMates, representing more than 40 nationalities, in 12 offices worldwide, and are looking for people to join the next stage of their journey as they continue to grow.

$230,000–$250,000/yr
US Unlimited PTO 12w paternity

  • Define and evolve reliability standards for the SmarterDx platform.
  • Enhance observability systems (metrics, logs, traces, alerting) to provide actionable insights and reduce mean time to detect (MTTD) and resolve (MTTR).
  • Reduce operational toil through automation, self-healing systems, and improved deployment and rollback mechanisms.

SmarterDx, a Smarter Technologies company, builds clinical AI that is transforming how hospitals translate care into payment. Founded by physicians in 2020, their platform connects clinical context with revenue intelligence, helping health systems recover millions in missed revenue, improve quality scores, and appeal every denial.

$140,000–$180,000/yr
Americas Unlimited PTO 16w maternity

  • Build and scale infrastructure to support billions of messages per day and real-time events
  • Automate deployments, alerting, and incident response
  • Tune MySQL and other datastore performance and improve reliability across distributed systems

Customer.io's platform enables over 8,000 companies, from scrappy startups to global brands, to send billions of automated emails, push notifications, in-app messages, and SMS every day. They foster a culture that values empathy, transparency, and responsibility.

  • Maximize the velocity of our product engineering team.
  • Ensure platform scalability, reliability, and security.
  • Champion best practices and shape the engineering culture.

They are building a robust, scalable trading platform to serve high-traffic, latency-sensitive applications. They leverage state-of-the-art technologies to support real-time trading while providing unparalleled reliability and performance.

Mexico

  • Collaborate with engineers in supporting new features and services.
  • Build tools to monitor site stability and performance.
  • Troubleshoot site issues using industry-leading tools like Splunk, Prometheus and OpenTelemetry.

Yelp's engineering culture is cooperative and values individual authenticity. They encourage creative solutions to problems and help users, grow as engineers, and have fun in a collaborative environment.

US

  • Help deploy and configure Dynatrace OneAgent and ActiveGates with automated tooling.
  • Define and instrument user‑centric metrics and objectives in Dynatrace.
  • Combine Davis® AI with Copilot/Claude to identify root causes and reduce MTTR.

AWP Safety's IT Internship Program is a hands‑on, learning experience for early‑career professionals who want to build a future in IT Site Reliability Engineering. They operate at the intersection of Software Engineering and Systems Operations, using Dynatrace to diagnose performance bottlenecks and automate "toil" out of existence.

Global

  • Design and implement comprehensive monitoring strategies.
  • Take ownership of production incident response, lead handling, and drive remediation.
  • Continuously improve operational processes, reliability practices, and team readiness.

InvestorFlow delivers industry specialized CRM and digital portals to help alternative asset firms find opportunities, create and manage relationships, and turn relationship insights into action. They serve over 175 clients, including 25 of the top 50 alternative asset managers, managing more than $6 trillion in assets.

US 6w PTO

  • Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
  • Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
  • Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana, the open source visualization tool, around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack.

$126,700–$215,400/yr
US

  • Contribute to Configuration Management and Infrastructure as Code for ServiceNow’s global private cloud.
  • Develop tools in Python, bash, and JavaScript to replace manual work and improve customer maintenance experience.
  • Foster a culture of continuous learning and improvement by sharing best practices in engineering and quality.

ServiceNow's technology makes the world work for everyone, and their people make it possible. They are an ambitious team of change makers with a restless curiosity and a drive for ingenuity, serving more than 7,700+ customers, approximately 85% of the Fortune 500®.

South America

  • Own the end‑to‑end lifecycle of core platform components, including cloud infrastructure primitives and Kubernetes clusters.
  • Design platform components to be resilient by default, applying SRE principles like fault isolation and capacity planning.
  • Drive Infrastructure‑as‑Code and GitOps‑first practices to ensure platform components are reproducible and auditable.

Pismo, founded in 2016, provides a comprehensive processing platform for banking, card issuing, and financial market infrastructure, helping customers innovate in banking and payments. With over 500 employees across 10+ countries, Pismo joined Visa in 2024, leveraging Visa’s solutions to advance financial technology.