Source Job

Global

  • Maintain and continuously improve production uptime, supporting our ≥99.9% target for 2026.
  • Monitor systems proactively and respond effectively to production incidents.
  • Drive improvements in MTTR (Mean Time to Resolution).

Azure Kubernetes Terraform PowerShell Bash

20 jobs similar to Site Reliability Engineer

Jobs ranked by similarity.

$98,583–$138,016/yr
US Unlimited PTO

  • Respond to production incidents and contribute to post-incident analysis.
  • Identify and automate manual processes to improve efficiency and reduce risk.
  • Enhance monitoring tools and platforms to improve observability.

Restaurant365 is a SaaS company that provides a unique, centralized solution for accounting and back-office operations for restaurants. They focus on empowering team members to produce top-notch results while elevating their skills.

$150,000–$185,000/yr
Global

  • Design, build, and maintain scalable Azure-based infrastructure (AKS, App Services, Virtual Machines, Functions, etc.).
  • Optimize cloud usage to balance performance, cost, and reliability.
  • Implement Infrastructure as Code (IaC) using Terraform/Bicep for consistent and repeatable deployments.

Orchestry is a rapidly growing SaaS company in the Microsoft 365 ecosystem, helping organizations simplify, govern, and automate their digital workplace. They work globally with partners and enterprise customers and operate as a fully remote company by design, intentionally building the foundation for the future of the company including their people, leadership capability, and culture.

$126,000–$184,000/yr
US

  • Own the operational stability and performance of Juul’s hybrid cloud infrastructure.
  • Lead automation efforts and architect for reliability.
  • Act as the final escalation point for critical incidents.

Juul Labs aims to transition the world’s billion adult smokers away from combustible cigarettes and eliminate their use, while also combating underage usage of their products. They are backed by leading technology investors and are committed to hiring great talent and building a diverse team.

US

  • Designs and maintains CI/CD pipelines using GitLab CI/CD.
  • Implements Infrastructure as Code (IaC) with tools like Terraform.
  • Automates complex workflows and enhances infrastructure scalability.

Everseen is a vision AI solutions provider for global retailers. They have over 900 employees globally, with headquarters in Cork, Ireland, European headquarters in Cork, Ireland, and a U.S. headquarters in Miami, with hubs in Romania, Serbia, India, Australia, and Spain.

$4,000–$5,000/mo
Latin America

  • Design and evolve production environments, define standards and best practices.
  • Partner with engineering and IT teams to build scalable, reliable systems.
  • Lead incident response practices, and set guardrails around security, reliability, and cost management.

They are looking for a Senior Site Reliability Engineer who can own the architecture, governance, and cost efficiency of their cloud and platform infrastructure. This role is a remote contractor role and they are seeking candidates located in LATAM.

US Canada Europe

  • Design, build, and maintain highly available, scalable infrastructure.
  • Manage and optimize infrastructure across GCP, AWS, Azure, and other cloud providers.
  • Develop comprehensive monitoring, logging, and alerting systems.

Bobsled is seeking a Site Reliability Engineer to enhance its data-sharing platform's reliability and scalability. We're a company that values growth, offering flexible work hours in a fully remote environment and fully sponsored individual coaching for all employees.

$110,000–$130,000/yr
US 2w PTO

  • Ensure uptime and performance through monitoring, incident response, and preventive measures.
  • Build and maintain CI/CD pipelines for smooth software releases.
  • Implement security best practices across infrastructure, applications, and data.

ALIS values and promotes diversity. They are an equal opportunity employer.

Slovakia

  • Operate and support Azure-based infrastructure and Rubrik backup solutions.
  • Manage and resolve incidents, changes, and problem tickets related to Azure and Rubik environments.
  • Contribute to continuous service improvements and automation initiatives.

Deutsche Telekom IT Solutions Slovakia entered the life of the Košice region in 2006. They have grown to be the second largest employer in the eastern part of the country with more than 3900 employees, providing innovative information and communication technology services.

Europe

  • Own the reliability, scalability, and performance of Peec AI’s core systems and infrastructure
  • Design, build, and maintain the tooling, automation, and monitoring that keep our services fast, secure, and highly available
  • Partner closely with product and engineering teams to ensure new features are reliable, observable, and easy to operate from day one

Peec AI is one of Europe’s fastest-growing Series A startups (no employee count/culture details given). They provide exciting and challenging work in the AI space.

US

  • Contribute to the design and implementation of Infrastructure as Code (IaC) solutions.
  • Build and optimize CI/CD pipelines using GitHub Actions and Azure DevOps.
  • Implement comprehensive monitoring, logging, and alerting strategies using Grafana.

InvestorFlow delivers industry specialized CRM, built on Salesforce, and digital portals. They help alternative asset firms find opportunities, create and manage relationships, and turn relationship insights into action. They serve over 175 clients and are headquartered in San Francisco, California.

Europe Middle East Africa

  • Design, deploy and maintain a cloud infrastructure to support a Dataiku SaaS offering mainly on AWS and Azure and GCP
  • Continuously improve the infrastructure, deployment and configuration to deliver more reliable, resilient, scalable and secure services
  • Automate as much as possible all technical operations

Dataiku is The Universal AI Platform™, giving organizations control over their AI talent, processes, and technologies to unleash the creation of analytics, models, and agents. They connect many data science technologies and integrate the best of data and AI tech.

US Canada Europe Asia

  • Automate the provisioning of all of Juniper Square’s infrastructure in code.
  • Partner with our Platform Engineering team on building developer tooling / improving developer experiences via joint initiatives and enhancements.
  • Partner with our Data Engineering team on improving our data posture and driving operational excellence.

Juniper Square's mission is to unlock the full potential of private markets by digitizing them to bring efficiency, transparency, and access. They are a values-driven organization with a hybrid workplace strategy, allowing employees to collaborate effectively across multiple countries and offering physical offices in several major cities.

US

  • Manage and optimize our Azure cloud infrastructure.
  • Ensure seamless CI/CD pipelines.
  • Maintain the security and efficiency of our Linux-based systems.

The company is seeking a skilled and experienced Azure DevOps Engineer. The company is dynamic.

$165,000–$200,000/yr
US Unlimited PTO

  • Contribute to building and operating the infrastructure that supports the HackerOne platform.
  • Improve the reliability, security, and scalability of our systems.
  • Design and operate highly available cloud systems and apply best practices for reliability, observability, and security.

HackerOne is a global leader in Continuous Threat Exposure Management (CTEM). The HackerOne Platform unites agentic AI solutions with the ingenuity of the world’s largest community of security researchers to continuously discover, validate, prioritize, and remediate exposures across code, cloud, and AI systems. They combine the ingenuity of the largest security research community with a best-in-class AI-powered platform, trusted by the world’s top organizations.

US

  • Design, build, and maintain secure, scalable cloud infrastructure.
  • Own CI/CD pipelines and deployment workflows across services and environments.
  • Improve reliability, availability, and performance through monitoring, alerting, and incident response practices.

Jobgether is a company that uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. They identify the top-fitting candidates and share this short list directly with the hiring company.

Europe 6w PTO

  • Take charge of production and development environments to drive automation and reliability.
  • Turn complex manual steps into simple, repeatable, automated flows and CI/CD pipelines.
  • Own high-impact data and system changes in production, implementing automation and guardrails.

They are a global materials science and digital identification solutions company with locations in over 50 countries. They have approximately 35,000 employees worldwide and are committed to fostering a culture of curiosity and courage.

US

  • Ensure near-zero downtime with monitoring and alerting, self-healing automation, and continuous improvement
  • Create highly automated, available and scalable systems by applying software and infrastructure principles
  • Employ and advise clients on DevOps and SRE principles and practices, covering deployment pipelines, HA, service reliability, technical debt, and operational toil for live services running at scale

66degrees is an AI transformation partner. They guide enterprises from business challenges to quantifiable outcomes, helping businesses reach their inflection point where chaotic data becomes a strategic asset, complexity becomes clarity, and AI becomes an engine for growth. They believe in thriving through challenges and winning together.

Europe US 5w PTO 16w maternity 6w paternity

  • Design, operate, and continuously improve the cloud infrastructure that powers our systems using infrastructure-as-code, monitoring, and observability.
  • Own critical parts of the platform: identify bottlenecks, propose and implement improvements, and drive reliability and performance at scale.
  • Run Kubernetes in production and evolve how we operate it.

Dune is on a mission to make crypto data accessible. They’re a collaborative multi-chain analytics platform used by thousands of developers, analysts, & investors to understand the on-chain world and the frontiers of finance. They are a team of ~60 employees working together across Europe and eastern US timezones.

US

  • Lead incident response as Incident Commander, coordinating teams, communications, and service restoration
  • Produce executive-level incident reports, run RCAs, and drive continuous improvement
  • Enforce change management and risk assessment for production changes

Truelogic is a leading provider of nearshore staff augmentation services headquartered in New York, delivering top-tier technology solutions to companies of all sizes. Their team of 600+ highly skilled tech professionals, based in Latin America, drives digital disruption by partnering with U.S. companies on their most impactful projects.

$120,000–$160,000/yr
US 3w PTO

  • Design, build, and maintain shared platform services that support secure and scalable infrastructure across client and internal environments.
  • Develop and maintain infrastructure-as-code (IaC) using tools such as Terraform, ARM/Bicep, or similar frameworks.
  • Build automation for system provisioning, configuration management, patching, and lifecycle operations.

Sentinel Blue is bringing enterprise-class cybersecurity to small and medium sized businesses. They are pushing the envelope of how things are done and constantly seeking innovative ways to meet that mission.