Source Job

UK

Run the production environment by monitoring availability and taking a holistic view of system health. Build software and systems to manage platform infrastructure and applications. Improve reliability, quality, and time-to-market of our suite of software solutions.

Python Go Java AWS Kubernetes

20 jobs similar to Senior Site Reliability Engineer

Jobs ranked by similarity.

Design, implement, monitor and maintain Sysdig's Infrastructure at scale on different clouds and on-prem. Collaborate with development teams to improve system reliability, performance, and scalability. Participate in on-call rotation, respond to incidents, conduct root cause analyses, and implement preventive measures.

Sysdig helps organizations secure innovation in the cloud with runtime insights, open innovation, and agentic AI, trusted by over 60% of the Fortune 500.

Responsible for automating infrastructure, maintaining system reliability, and bridging the gap between operations and database management. Design, deploy, and manage scalable infrastructure on Google Cloud Platform (GCP). Implement and maintain CI/CD pipelines for seamless deployment.

Miratech is a global IT services and consulting company that brings together enterprise and start-up innovation to support digital transformation.

$95,000–$110,000/yr
US

  • Become a member of a highly collaborative engineering team offering a unique blend of Cloud Infrastructure Administration, Site Reliability Engineering, Security Operations, and Vulnerability Management.
  • Coordinate with client product teams, engineering team members, and other stakeholders to monitor and maintain a secure and resilient cloud-hosted infrastructure to established SLAs.
  • Innovate and implement using automated orchestration and configuration management techniques.

Coalfire is on a mission to make the world a safer place by solving our clients’ toughest cybersecurity challenges.

Germany

Shape the way Scalable runs microservices in a performant, secure, and cost-efficient way. Collaborate with cross-functional teams to understand scalability requirements. Develop and maintain internal tooling around Monitoring, Developer Portal, and Load Testing.

Scalable Capital is a leading digital investment and banking platform with a full banking licence, empowering people across Europe to shape their own finances.

Canada 5w PTO

Design, implement, and evolve large-scale, cloud-native infrastructure supporting MariaDB's global SaaS platform. Lead reliability and scalability initiatives, driving automation and resilience through infrastructure-as-code and GitOps practices. Proactively identify and remediate systemic reliability issues, ensuring high service availability and performance across multi-cloud environments.

MariaDB is making a big impact on the world and is the backbone of applications used everyday, including 75% of the Fortune 500 companies.

$125,000–$169,000/yr
Unlimited PTO

  • Design, scale, and operate resilient, cloud-native infrastructure in AWS with an emphasis on EKS, IAM, RBAC, and modern security-first practices.
  • Build and optimize CI/CD pipelines with GitHub Actions and GitHub Advanced Security enabling velocity without compromising safety.
  • Own observability across the stack using Datadog (metrics, logging, alerting, and tracing).

DexCare optimizes time in healthcare, streamlining patient access, reducing waits, and enhancing overall experiences. They are committed to creating an inclusive workplace where diversity drives innovation and belonging strengthens collaboration, enabling everyone to thrive.

India

  • Oversee the reliability, scalability, performance, and security of key production services.
  • Collaborate with cross-functional teams to develop and maintain resilient infrastructure.
  • Provide expert mentorship and guidance on best practices to engineers throughout the organization.

Cision is a global leader in PR, marketing and social media management technology and intelligence, helping brands and organizations connect with customers and stakeholders to drive business results. The company has offices in 24 countries throughout the Americas, EMEA and APAC.

$120,032–$164,368/yr
UK

As a Platform Engineer, enhance and maintain foundational tools and systems, working hands-on with Kubernetes clusters and AWS infrastructure. Build and maintain services that abstract and orchestrate our infrastructure, designing and implementing backend services like APIs and controllers. Develop software for complex projects, and manage infrastructure migrations and security tooling.

Monzo is on a mission to make money work for everyone, waving goodbye to the complicated ways of traditional banking, offering personal and business bank accounts.

Brazil 26w maternity 4w paternity

Support the evolution of our platform by improving scalability, reliability, observability, and security. Proactively identify bottlenecks and unlock the autonomy of the entire engineering team. Maintain infrastructure & deployment pipelines and collaborate with engineering teams on architectural decisions and production-readiness practices.

Feegow joined the Docplanner Group, a health-tech company, in 2022 and is dedicated to developing innovative solutions for physicians and managers.

Europe

As an SRE you will be responsible for ensuring the availability, performance and cost effectiveness of these services. You will be working with multiple feature development teams and the BAU/Support team to define and evolve our cloud & on-prem infrastructure & delivery pipelines, improving system observability. Proactively identifying and mitigating reliability risks.

In 2019, our founders were working as engineers solving complex cross domain problems within government organisations TwinStream was formed.

ANZ

  • Own challenging infrastructure problems end-to-end by understanding how engineers use the platform.
  • Design scalable, maintainable services and contribute to technical proposals.
  • Contribute to the roadmap, highlighting opportunities, validating approaches and helping keep our platform solutions current with cloud best practices.

Canva's intuitive suite of design products is powered by our large distributed infrastructure group, setting large and ambitious goals.

India Unlimited PTO

Seeking an experienced Site Reliability Engineer to help build highly resilient and scalable systems by automating, measuring, and monitoring everything. Implement highly-available and scalable architectures for core and third-party components of Acquia Source. Implement metrics, monitoring, and incident response processes.

Acquia is an open source digital experience company providing technology to brands that allows them to embrace innovation and create customer moments that matter.

$150,100–$188,100/yr
US Canada 2w PTO 12w maternity 12w paternity

  • Create and test reliable cloud infrastructure services that support Webflow’s range of products.
  • Balance reliability, scalability, and cost efficiency concerns while refactoring and modernizing existing services.
  • Collaborate with product engineering teams to deliver new solutions for services and ways of working that might not exist yet.

Webflow is the leading visual development platform for building powerful websites without writing code.

$95,696–$108,929/yr
AU 5w PTO 12w maternity

  • Share SRE expertise with teams across the company.
  • Keep our build systems running with high reliability and availability.
  • Improve and iterate on our existing reliability practices.

Octopus Deploy sets the standard for Continuous Delivery, empowering software teams to deliver value in an agile way.

  • Design, implement, and manage infrastructure for our cloud-based platforms (AWS).
  • Create and automate deployment pipelines using CI/CD tools (Gitlab / Github Actions).
  • Ensure system scalability, availability, and reliability through proactive monitoring and automation.

Prompt is revolutionizing healthcare by delivering highly automated and modern B2B enterprise software to rehab therapy businesses, the teams within, and the patients they serve.

$140,000–$190,000/yr
US Canada Unlimited PTO

  • Architect and maintain scalable, reliable infrastructure: Design and optimize infrastructure for high availability, fault tolerance, and performance across distributed systems.
  • Lead incident management and root cause analysis: Own incident response processes, ensure swift resolution of issues, and drive post-incident improvements to prevent recurrences.
  • Service monitoring and automation: Build and maintain automated monitoring, alerting, and healing systems that improve system health, reduce manual intervention, and minimize downtime.

VGS is the world's leader in payment tokenization, empowering clients and partners by tokenizing sensitive payment data and limiting compliance scope. They embed a universal token vault into their technology stack to manage the complexities of payment data tokenization across processors and networks and more. While the job posting doesn't specify size, they appear to have a culture that values transparency, collaboration, grit, and humility.

$120,000–$140,000/yr

  • Design and plan cloud-native systems aligned with business goals and security best practices.
  • Implement and support AI-based automation tools and services.
  • Continuously tune cloud and automation workloads to improve reliability and performance.

PerfectServe offers unified healthcare communication solutions to help physicians, nurses, and care team members provide exceptional patient care.

India

  • Design, develop, and maintain scalable, secure, and high-performance software solutions.
  • Collaborate with distributed teams across multiple time zones, driving process improvements.
  • Contribute to products used globally, impacting platform reliability and user experience.

This position is posted by Jobgether on behalf of a partner company.

Europe

  • Lead cross-team infrastructure security projects from design to delivery.
  • Design and implement robust security solutions for cloud environments and container platforms.
  • Identify security gaps and remediate systemic security issues in cloud and infrastructure configurations.

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements.

Europe

  • Design and implement the "Golden Paths"—standardized, automated templates for microservices and infrastructure.
  • Develop the CLI tools, portals, or API interfaces that abstract the complexity of our cloud infrastructure.
  • Develop and maintain a library of modular, testable, and versioned Terraform modules.

SEON is a command center for fraud prevention and AML compliance, helping companies stop fraud, reduce risk and protect revenue. They are powered by real-time, first-party data signals, enriches customer profiles, flags suspicious behavior and streamlines compliance workflows.