Source Job

US Canada Europe Asia

  • Automate the provisioning of all of Juniper Square’s infrastructure in code.
  • Partner with our Platform Engineering team on building developer tooling / improving developer experiences via joint initiatives and enhancements.
  • Partner with our Data Engineering team on improving our data posture and driving operational excellence.

AWS PostgreSQL Kubernetes Terraform Python

20 jobs similar to Staff Site Reliability Engineer

Jobs ranked by similarity.

Americas EMEA Unlimited PTO

  • Design and implement highly scalable infrastructure for GitLab.com to support current and future growth.
  • Collaborate with cross-functional teams across the Infrastructure organization to plan and deliver projects that shape GitLab’s platform direction.
  • Operate and improve edge services and Kubernetes workloads, acting as a subject matter expert within the infrastructure department.

GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. They aim to enable everyone to contribute to and co-create the software that powers our world.

$219,000–$245,000/yr
US Unlimited PTO

  • Architect, operate, improve and secure the platform the Garner Health app runs on
  • Boost development velocity and productivity
  • Build systems to a high engineering standard and hold others to the same high standard

Garner has developed a revolutionary approach to evaluating doctor performance and a unique incentive model that's reshaping the healthcare economy to ensure everyone can afford high quality care. They have more than doubled their revenue annually over the last 5 years. Garner's award winning culture is designed to cultivate teamwork, trust, autonomy, exceptional results, and individual growth.

US

  • Ensure the smooth operation and high availability of Clarifai's core services
  • Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency
  • Design and implement scalable, secure, and cost-effective infrastructure solutions

Clarifai is a leading AI platform specializing in computer vision and generative AI, empowering organizations to transform unstructured data into actionable insights. Founded in 2013, they have a diverse, globally distributed team with $100M in funding and are committed to building a diverse and inclusive team.

US Unlimited PTO 11w maternity

  • Own and maintain the incident response process, including defining procedures, tools, and best practices
  • Guide teams in establishing and monitoring Service Level Objectives (SLOs), including setting up alerts and reporting systems
  • Lead capacity planning initiatives, focusing on both short and long-term scalability while optimizing costs

Underdog makes sports more fun by building the best products for sports fans. They are a fast-growing sports company valued at $1.3B with a focus on a seamless, simple, easy to use, intuitive and fun app.

US Unlimited PTO

  • Contribute to high impact AWS cloud infrastructure initiatives.
  • Participate in operability and production readiness reviews.
  • Advocate and implement Site Reliability Engineering practices.

Patreon is a media and community platform where creators give fans access to exclusive work. They have generated over $10 billion for creators and have 25 million+ paid memberships, with a hybrid work model and offices in New York and San Francisco.

Global

  • Design and implement reliable and scalable AWS architecture.
  • Support the CICD process with ArgoCD and GitOps, automating deployments with Terraform.
  • Optimize system performance and troubleshoot issues, collaborating with development teams.

Cloudbeds is transforming hospitality with its intelligently designed platform that powers properties across 150 countries. They are a completely remote team of 650+ employees across 40+ countries, focused on building AI-powered solutions for hotels.

Europe US 5w PTO 16w maternity 6w paternity

  • Design, operate, and continuously improve the cloud infrastructure that powers our systems using infrastructure-as-code, monitoring, and observability.
  • Own critical parts of the platform: identify bottlenecks, propose and implement improvements, and drive reliability and performance at scale.
  • Run Kubernetes in production and evolve how we operate it.

Dune is on a mission to make crypto data accessible. They’re a collaborative multi-chain analytics platform used by thousands of developers, analysts, & investors to understand the on-chain world and the frontiers of finance. They are a team of ~60 employees working together across Europe and eastern US timezones.

Europe

  • Maintaining and updating Glia’s core infrastructure.
  • Troubleshooting and resolving infrastructure-related issues.
  • Improving our security posture.

Glia provides an AI customer service solution for banks and credit unions, unifying AI and human agents across every voice and digital conversation through its ChannelLess® Architecture. Valued at over $1 billion, Glia powers over 700 financial institutions and is certified as a Great Place to Work, with 98% employee satisfaction.

$150,000–$200,000/yr
US Unlimited PTO

  • Architect, maintain, and scale critical infrastructure.
  • Ensure system reliability and optimize performance.
  • Implement modern deployment strategies.

Scribe's Workflow AI platform automatically captures and optimizes workflows so teams work smarter, faster, and more consistently. They are a fast-growing company founded in 2019 with over 5 million users across 600,000 businesses, and they are backed by leading investors.

US

  • Ensure near-zero downtime with monitoring and alerting, self-healing automation, and continuous improvement
  • Create highly automated, available and scalable systems by applying software and infrastructure principles
  • Employ and advise clients on DevOps and SRE principles and practices, covering deployment pipelines, HA, service reliability, technical debt, and operational toil for live services running at scale

66degrees is an AI transformation partner. They guide enterprises from business challenges to quantifiable outcomes, helping businesses reach their inflection point where chaotic data becomes a strategic asset, complexity becomes clarity, and AI becomes an engine for growth. They believe in thriving through challenges and winning together.

Global

  • Own and operate core platform systems across AWS, GCP, Vercel, Github, and Cloudflare.
  • Improve reliability, scalability, and security of production and non-production environments.
  • Improve local development environments and onboarding experience for engineers.

Moxie empowers ambitious aesthetic entrepreneurs to build profitable, independent practices. A global, remote-first team of more than 140 people supports hundreds of practices nationwide as they unlock sustainable success for aesthetic entrepreneurs.

$130,000–$140,000/yr
Global 7w PTO

  • Act as a primary responder for system incidents and outages, ensuring high availability and fast recovery.
  • Own and continuously improve monitoring, alerting, and log management systems.
  • Manage, optimize, and scale database infrastructure including MySQL, PostgreSQL, ClickHouse, and Redis.

Jobgether uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. They identify the top-fitting candidates, and this shortlist is then shared directly with the hiring company.

US Unlimited PTO

  • Be a key contributor on an Agile development team, collaboratively realizing business value through iterative software development lifecycle.
  • Build and execute the monitoring strategy for ScienceLogic SaaS infrastructure.
  • Define, deploy, and maintain system and service monitors.

ScienceLogic is a leader in IT Operations Management, giving modern IT operations actionable insights for faster problem resolution and prediction. They see everything across cloud and distributed architectures, contextualizing data through relationship mapping, and acting on this insight through integration and automation.

Australia

  • Support and evolve the reliability of platforms used by the AI Research team.
  • Ensure production services meet expectations for availability, latency, and operational readiness.
  • Build and maintain Kubernetes-based services on GCP using infrastructure-as-code and GitOps.

Algolia is a pioneer and market leader in AI Search, empowering 17,000+ businesses to deliver blazing-fast, predictive search and browse experiences. They have raised $150 million in Series D funding, quadrupling their valuation to $2.25 billion, investing in their market-leading platform.

US

  • Work directly with customers to ensure successful Teleport deployments.
  • Meet regularly with customers, understand pain points blocking deployments and remove roadblocks.
  • Work with customers to articulate the problem they are trying to solve, gather requirements, and make the business case to the product and engineering teams to invest in resolving the issue.

Teleport is the Infrastructure Identity Company, modernizing identity, access, and policy for infrastructure, improving engineering velocity and resiliency of critical infrastructure against human factors and/or compromise. They are a fast-growing, well-funded Y-Combinator company that values craft, strongly supports work/life balance, and embraces a culture of humility, honesty, and transparency.

US

  • Make deployments boring (in the best way possible)
  • Own CI/CD pipelines: optimize build times, improve caching, reduce flakiness
  • Evolve our Kubernetes (EKS) deployment strategy for reliability and speed

Obvious is building an AI-native workspace, an operating system for work that puts co-intelligence at the center. They are a small and talent-dense team with world-class builders, former founders, and leaders from companies like Netflix, Google, and Meta.

US Canada Europe

  • Lead a global team of Site Reliability Engineers.
  • Recruit, hire, onboard and develop engineers.
  • Guide project planning by defining milestones and identifying dependencies.

AuthZed creates and maintains SpiceDB and the authorization infrastructure. They are a Series A company with a fully remote team across the US, Canada, and Europe and a hardworking, close-knit group with a software-driven culture that values integrity, collaboration, and open-mindedness.

US

  • Understand and participate in the changing FedRAMP space.
  • Own and champion high operational standards of Confluent Cloud systems leveraged by federal agencies.
  • Innovate and design solutions to reduce toil, bolster operational maturity, and make day-to-day worklife easier.

Confluent is rewriting how data moves and what the world can do with it. Their platform puts information in motion, streaming in near real-time so companies can react faster and build smarter. They value team players who ask hard questions, give honest feedback, and show up for each other.

Turkey

  • Responsible for Insider One's technological well-being and impacts the development lifecycle.
  • Develops internal solutions and improves site reliability through continuous delivery and integration.
  • Creates analytical tools for application performance insights and ensures projects are completed on time.

Insider One is a platform that provides marketing and customer engagement tools, enabling teams to reach their full potential. They are a B2B SaaS unicorn with 1,500+ team members representing 50+ nationalities across 30+ offices and are dedicated to social responsibility.

$126,000–$184,000/yr
US

  • Own the operational stability and performance of Juul’s hybrid cloud infrastructure.
  • Lead automation efforts and architect for reliability.
  • Act as the final escalation point for critical incidents.

Juul Labs aims to transition the world’s billion adult smokers away from combustible cigarettes and eliminate their use, while also combating underage usage of their products. They are backed by leading technology investors and are committed to hiring great talent and building a diverse team.