Source Job

$113,082–$175,725/yr
Canada

  • Operate and maintain large-scale data systems, ensuring stability and performance.
  • Design, implement, and optimize deployment processes using virtualization.
  • Monitor system health, analyze failures, and identify instability sources.

SRE DevOps Python Kubernetes Terraform

20 jobs similar to Senior Site Reliability Engineer

Jobs ranked by similarity.

US

  • Ensure near-zero downtime with monitoring and alerting, self-healing automation, and continuous improvement
  • Create highly automated, available and scalable systems by applying software and infrastructure principles
  • Employ and advise clients on DevOps and SRE principles and practices, covering deployment pipelines, HA, service reliability, technical debt, and operational toil for live services running at scale

66degrees is an AI transformation partner. They guide enterprises from business challenges to quantifiable outcomes, helping businesses reach their inflection point where chaotic data becomes a strategic asset, complexity becomes clarity, and AI becomes an engine for growth. They believe in thriving through challenges and winning together.

US

  • Own developer operations and platform reliability across Introzy’s product stack.
  • Lead how we run infrastructure on Render, design and evolve our observability and alerting, shape our CI/CD and release practices.
  • Continuously improve internal developer experience so the engineering team can ship quickly and safely.

Introzy is a multi-app platform designed to unify networking, workflow, and productivity. As a subsidiary of Sanguine Technology Solutions, they are an early-stage company moving fast to deliver value, with a lean engineering team and a culture that embraces AI.

India

  • Configure/operate monitoring, logging, and tracing tools for application performance.
  • Build dashboards and automation workflows for system reliability and uptime.
  • Collaborate with software engineering teams to design and implement robust systems.

Jobgether is a platform that uses AI-powered matching to connect job seekers with employers. They ensure applications are reviewed quickly and fairly, then share a shortlist with the hiring company for final decisions.

US Canada Europe

  • Lead a global team of Site Reliability Engineers.
  • Recruit, hire, onboard and develop engineers.
  • Guide project planning by defining milestones and identifying dependencies.

AuthZed creates and maintains SpiceDB and the authorization infrastructure. They are a Series A company with a fully remote team across the US, Canada, and Europe and a hardworking, close-knit group with a software-driven culture that values integrity, collaboration, and open-mindedness.

US Unlimited PTO

  • Contribute to high impact AWS cloud infrastructure initiatives.
  • Participate in operability and production readiness reviews.
  • Advocate and implement Site Reliability Engineering practices.

Patreon is a media and community platform where creators give fans access to exclusive work. They have generated over $10 billion for creators and have 25 million+ paid memberships, with a hybrid work model and offices in New York and San Francisco.

Europe

  • Collaborate with the team to design, build, and maintain a robust and scalable infrastructure.
  • Manage and optimize Linux-based systems to ensure high availability and performance.
  • Utilize Kubernetes to orchestrate containers and maintain containerized applications effectively.

As Europe’s No.1 e-pharmacy, Redcare Pharmacy is powered by passionate teams and cutting-edge innovation. They strive to create a healthy, collaborative work environment where every employee feels valued and inspired to contribute to their vision “Until every human has their health”.

$163,500–$237,500/yr
US

  • Partner closely with data engineering and data science teams to enable reliable data pipelines, analytics, and ML workflows
  • Support, operate, and optimize Databricks and Snowflake environments in production
  • Monitor, troubleshoot, and optimize systems for performance, reliability, and cost efficiency

Life360's mission is to keep people close to the ones they love with their mobile app and Tile tracking devices, empowering members to protect what they care about most with services like location sharing and crash detection. Life360 has more than 750 remote-first employees and enhances everyday family life with seamless coordination.

US

  • Ensure the smooth operation and high availability of Clarifai's core services
  • Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency
  • Design and implement scalable, secure, and cost-effective infrastructure solutions

Clarifai is a leading AI platform specializing in computer vision and generative AI, empowering organizations to transform unstructured data into actionable insights. Founded in 2013, they have a diverse, globally distributed team with $100M in funding and are committed to building a diverse and inclusive team.

$150,000–$200,000/yr
US Unlimited PTO

  • Architect, maintain, and scale critical infrastructure.
  • Ensure system reliability and optimize performance.
  • Implement modern deployment strategies.

Scribe's Workflow AI platform automatically captures and optimizes workflows so teams work smarter, faster, and more consistently. They are a fast-growing company founded in 2019 with over 5 million users across 600,000 businesses, and they are backed by leading investors.

$89,155–$287,488/yr
Global

  • Configure and maintain cloud infrastructure automation using Terraform, focusing on CDN optimization and content delivery performance
  • Develop capacity planning strategies and performance optimization initiatives for high-volume spatial content delivery.
  • Instrument services to understand system health.

Miris is a cutting-edge technology company building the future of 3D content delivery at global scale. Our mission is to empower creators and developers to deliver high-fidelity, photorealistic 3D experiences to billions of users instantly, seamlessly, and across all major platforms and devices.

$126,000–$184,000/yr
US

  • Own the operational stability and performance of Juul’s hybrid cloud infrastructure.
  • Lead automation efforts and architect for reliability.
  • Act as the final escalation point for critical incidents.

Juul Labs aims to transition the world’s billion adult smokers away from combustible cigarettes and eliminate their use, while also combating underage usage of their products. They are backed by leading technology investors and are committed to hiring great talent and building a diverse team.

US

  • Work directly with customers to ensure successful Teleport deployments.
  • Meet regularly with customers, understand pain points blocking deployments and remove roadblocks.
  • Work with customers to articulate the problem they are trying to solve, gather requirements, and make the business case to the product and engineering teams to invest in resolving the issue.

Teleport is the Infrastructure Identity Company, modernizing identity, access, and policy for infrastructure, improving engineering velocity and resiliency of critical infrastructure against human factors and/or compromise. They are a fast-growing, well-funded Y-Combinator company that values craft, strongly supports work/life balance, and embraces a culture of humility, honesty, and transparency.

Global

  • Design and implement reliable and scalable AWS architecture.
  • Support the CICD process with ArgoCD and GitOps, automating deployments with Terraform.
  • Optimize system performance and troubleshoot issues, collaborating with development teams.

Cloudbeds is transforming hospitality with its intelligently designed platform that powers properties across 150 countries. They are a completely remote team of 650+ employees across 40+ countries, focused on building AI-powered solutions for hotels.

US Canada Europe Asia

  • Automate the provisioning of all of Juniper Square’s infrastructure in code.
  • Partner with our Platform Engineering team on building developer tooling / improving developer experiences via joint initiatives and enhancements.
  • Partner with our Data Engineering team on improving our data posture and driving operational excellence.

Juniper Square's mission is to unlock the full potential of private markets by digitizing them to bring efficiency, transparency, and access. They are a values-driven organization with a hybrid workplace strategy, allowing employees to collaborate effectively across multiple countries and offering physical offices in several major cities.

Europe

  • Own the reliability, scalability, and performance of Peec AI’s core systems and infrastructure
  • Design, build, and maintain the tooling, automation, and monitoring that keep our services fast, secure, and highly available
  • Partner closely with product and engineering teams to ensure new features are reliable, observable, and easy to operate from day one

Peec AI is one of Europe’s fastest-growing Series A startups (no employee count/culture details given). They provide exciting and challenging work in the AI space.

US

  • Designing & maintaining GCP infrastructure (GKE, Bigtable, BigQuery, GCS, networking).
  • Building monitoring, alerting, logging, and observability from the ground up.
  • Improving our security posture across auth, IAM, policies, and data access.

Software Mind develops solutions that make an impact for companies around the globe. They build cross-functional engineering teams that take ownership and crave more, embracing openness, acting with respect, showing grit & guts and combining employment with enjoyment.

Americas EMEA Unlimited PTO

  • Design and implement highly scalable infrastructure for GitLab.com to support current and future growth.
  • Collaborate with cross-functional teams across the Infrastructure organization to plan and deliver projects that shape GitLab’s platform direction.
  • Operate and improve edge services and Kubernetes workloads, acting as a subject matter expert within the infrastructure department.

GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. They aim to enable everyone to contribute to and co-create the software that powers our world.

$219,000–$245,000/yr
US Unlimited PTO

  • Architect, operate, improve and secure the platform the Garner Health app runs on
  • Boost development velocity and productivity
  • Build systems to a high engineering standard and hold others to the same high standard

Garner has developed a revolutionary approach to evaluating doctor performance and a unique incentive model that's reshaping the healthcare economy to ensure everyone can afford high quality care. They have more than doubled their revenue annually over the last 5 years. Garner's award winning culture is designed to cultivate teamwork, trust, autonomy, exceptional results, and individual growth.

$109,800–$252,500/yr
US Unlimited PTO 16w maternity 8w paternity

  • Design, implement, and maintain scalable and reliable infrastructure solutions.
  • Automate deployments and maintain a resilient, secure SaaS application platform.
  • Develop comprehensive monitoring and alerting solutions, and respond to incidents.

Veeam is the #1 global market leader in data resilience, believing businesses should control all their data whenever and wherever they need it, providing data resilience through data backup, data recovery, data portability, data security, and data intelligence. Based in Seattle, Veeam protects over 550,000 customers worldwide who trust Veeam to keep their businesses running.

$110,000–$135,000/yr
Canada

  • Collaborate with cross-functional teams to build and deliver complex and highly available cloud and platform solutions.
  • Provide recommendations and expertise in cloud and platform provisioning.
  • Develop and maintain infrastructure code for various solutions and provide support to clients or partners.

Smile Digital Health makes data collection and exchange easy for healthcare stakeholders with our FHIR-based data liberation platform. Our health data platform and data management solutions are used in over 20 countries and we were #19 on Deloitte's Technology Fast 50 Ranking for 2024!