Source Job

US Unlimited PTO

  • Be a key contributor on an Agile development team, collaboratively realizing business value through iterative software development lifecycle.
  • Build and execute the monitoring strategy for ScienceLogic SaaS infrastructure.
  • Define, deploy, and maintain system and service monitors.

Prometheus Python Terraform

20 jobs similar to Senior Site Reliability Engineer, Observability

Jobs ranked by similarity.

US Canada 6w PTO

  • Work with your team to build and roll out new features, then use the results to iterate and improve.
  • Drive projects from initial ideation all the way to operations once it is in the hands of customers.
  • Maintain critical systems, and own their reliability, performance, and availability.

Grafana Labs is a remote-first, open-source powerhouse with over 20M users. They provide observability strategies for over 3,000 companies, featuring scalable metrics, logs, and traces, and thrive in an innovation-driven environment with transparency, autonomy, and trust.

US Canada Europe

  • Design, build, and maintain highly available, scalable infrastructure.
  • Manage and optimize infrastructure across GCP, AWS, Azure, and other cloud providers.
  • Develop comprehensive monitoring, logging, and alerting systems.

Bobsled is seeking a Site Reliability Engineer to enhance its data-sharing platform's reliability and scalability. We're a company that values growth, offering flexible work hours in a fully remote environment and fully sponsored individual coaching for all employees.

US Unlimited PTO

  • Contribute to high impact AWS cloud infrastructure initiatives.
  • Participate in operability and production readiness reviews.
  • Advocate and implement Site Reliability Engineering practices.

Patreon is a media and community platform where creators give fans access to exclusive work. They have generated over $10 billion for creators and have 25 million+ paid memberships, with a hybrid work model and offices in New York and San Francisco.

$150,000–$200,000/yr
US Unlimited PTO

  • Architect, maintain, and scale critical infrastructure.
  • Ensure system reliability and optimize performance.
  • Implement modern deployment strategies.

Scribe's Workflow AI platform automatically captures and optimizes workflows so teams work smarter, faster, and more consistently. They are a fast-growing company founded in 2019 with over 5 million users across 600,000 businesses, and they are backed by leading investors.

Global

  • Own and operate core platform systems across AWS, GCP, Vercel, Github, and Cloudflare.
  • Improve reliability, scalability, and security of production and non-production environments.
  • Improve local development environments and onboarding experience for engineers.

Moxie empowers ambitious aesthetic entrepreneurs to build profitable, independent practices. A global, remote-first team of more than 140 people supports hundreds of practices nationwide as they unlock sustainable success for aesthetic entrepreneurs.

Canada

  • Design, create, and maintain software and systems to improve the availability, scalability, and efficiency of Thumbtack's services
  • Set the architectural direction of infrastructure and platform services while supporting the engineering organization
  • Design and implement tools and processes used for deployment, change, service, and infrastructure management

Thumbtack helps millions of people confidently care for their homes through personalized guidance, AI tools, and a hiring experience. They have a growing community of 300,000 local service businesses.

$189,592–$220,000/yr

  • Responsible for custom architectural design, implementation, monitoring, and maintenance for production application environments.
  • Work with the Principal Software Engineer on technical architecture and design based on customer product requirements.
  • Hands-on commissioning, configuration, administration, documentation, and support for all on-prem & cloud (AWS) environments.

NBCUniversal is one of the world's leading media and entertainment companies creating world-class content across film, television, streaming, theme parks, and consumer experiences. They own leading entertainment and news brands and are a subsidiary of Comcast Corporation, committed to improving communities and fostering an inclusive culture.

$89,155–$287,488/yr
Global

  • Configure and maintain cloud infrastructure automation using Terraform, focusing on CDN optimization and content delivery performance
  • Develop capacity planning strategies and performance optimization initiatives for high-volume spatial content delivery.
  • Instrument services to understand system health.

Miris is a cutting-edge technology company building the future of 3D content delivery at global scale. Our mission is to empower creators and developers to deliver high-fidelity, photorealistic 3D experiences to billions of users instantly, seamlessly, and across all major platforms and devices.

US

  • Ensure near-zero downtime with monitoring and alerting, self-healing automation, and continuous improvement
  • Create highly automated, available and scalable systems by applying software and infrastructure principles
  • Employ and advise clients on DevOps and SRE principles and practices, covering deployment pipelines, HA, service reliability, technical debt, and operational toil for live services running at scale

66degrees is an AI transformation partner. They guide enterprises from business challenges to quantifiable outcomes, helping businesses reach their inflection point where chaotic data becomes a strategic asset, complexity becomes clarity, and AI becomes an engine for growth. They believe in thriving through challenges and winning together.

US

  • Ensure the smooth operation and high availability of Clarifai's core services
  • Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency
  • Design and implement scalable, secure, and cost-effective infrastructure solutions

Clarifai is a leading AI platform specializing in computer vision and generative AI, empowering organizations to transform unstructured data into actionable insights. Founded in 2013, they have a diverse, globally distributed team with $100M in funding and are committed to building a diverse and inclusive team.

$219,000–$245,000/yr
US Unlimited PTO

  • Architect, operate, improve and secure the platform the Garner Health app runs on
  • Boost development velocity and productivity
  • Build systems to a high engineering standard and hold others to the same high standard

Garner has developed a revolutionary approach to evaluating doctor performance and a unique incentive model that's reshaping the healthcare economy to ensure everyone can afford high quality care. They have more than doubled their revenue annually over the last 5 years. Garner's award winning culture is designed to cultivate teamwork, trust, autonomy, exceptional results, and individual growth.

Global

  • Design and implement reliable and scalable AWS architecture.
  • Support the CICD process with ArgoCD and GitOps, automating deployments with Terraform.
  • Optimize system performance and troubleshoot issues, collaborating with development teams.

Cloudbeds is transforming hospitality with its intelligently designed platform that powers properties across 150 countries. They are a completely remote team of 650+ employees across 40+ countries, focused on building AI-powered solutions for hotels.

US Unlimited PTO 11w maternity

  • Own and maintain the incident response process, including defining procedures, tools, and best practices
  • Guide teams in establishing and monitoring Service Level Objectives (SLOs), including setting up alerts and reporting systems
  • Lead capacity planning initiatives, focusing on both short and long-term scalability while optimizing costs

Underdog makes sports more fun by building the best products for sports fans. They are a fast-growing sports company valued at $1.3B with a focus on a seamless, simple, easy to use, intuitive and fun app.

$109,800–$252,500/yr
US Unlimited PTO 16w maternity 8w paternity

  • Design, implement, and maintain scalable and reliable infrastructure solutions.
  • Automate deployments and maintain a resilient, secure SaaS application platform.
  • Develop comprehensive monitoring and alerting solutions, and respond to incidents.

Veeam is the #1 global market leader in data resilience, believing businesses should control all their data whenever and wherever they need it, providing data resilience through data backup, data recovery, data portability, data security, and data intelligence. Based in Seattle, Veeam protects over 550,000 customers worldwide who trust Veeam to keep their businesses running.

$126,000–$184,000/yr
US

  • Own the operational stability and performance of Juul’s hybrid cloud infrastructure.
  • Lead automation efforts and architect for reliability.
  • Act as the final escalation point for critical incidents.

Juul Labs aims to transition the world’s billion adult smokers away from combustible cigarettes and eliminate their use, while also combating underage usage of their products. They are backed by leading technology investors and are committed to hiring great talent and building a diverse team.

$145,000–$185,000/yr
US Unlimited PTO

  • Be a keen learner, working with cloud-native, highly scalable infrastructure and gaining expertise in container orchestration, networking, and observability.
  • Be a passionate problem solver, tackling scalability, reliability, and troubleshooting challenges in distributed systems.
  • Be a great communicator, engaging directly with developers, engineering teams, and product teams to understand infrastructure challenges and provide solutions.

Temporal provides an open-source programming model that simplifies code, improves application reliability, and helps developers focus on delivering features faster. They aim to be the reliable foundation of every developer’s toolbox and value curiosity, drive, collaboration, genuineness, and humility.

Latin America

  • Design, implement, and manage cloud infrastructure using Infrastructure as Code (IaC) tools.
  • Design, build, and maintain scalable CI/CD pipelines using tools like CircleCI or GitHub Actions.
  • Implement and maintain observability tooling (Prometheus, Grafana, Datadog), and lead incident response to ensure system reliability.

Engine is transforming business travel into something personalized, rewarding, and simple. More than 20,000 companies already rely on Engine to support over 1 million travelers and billions in annual bookings each year.

US

  • Design, create, and maintain software and systems to improve the availability, scalability, and efficiency of Thumbtack's services.
  • Set the architectural direction of infrastructure and platform services while supporting the engineering organization.
  • Troubleshoot and debug critical systems throughout the SDLC.

Thumbtack helps millions of people confidently care for their homes by offering personalized guidance, AI tools, and a hiring experience. They have a growing community of 300,000 local service businesses and value a cross functional collaborative culture.

US

  • Understand and participate in the changing FedRAMP space.
  • Own and champion high operational standards of Confluent Cloud systems leveraged by federal agencies.
  • Innovate and design solutions to reduce toil, bolster operational maturity, and make day-to-day worklife easier.

Confluent is rewriting how data moves and what the world can do with it. Their platform puts information in motion, streaming in near real-time so companies can react faster and build smarter. They value team players who ask hard questions, give honest feedback, and show up for each other.

US

  • Architect and deploy secure, scalable infrastructure using Terraform, CloudFormation, or similar tools.
  • Ensure the platform meets strict SLA requirements for enterprise clients, minimizing downtime.
  • Implement comprehensive monitoring, logging, and alerting to provide deep visibility into system health.

Filevine provides cloud-based workflow tools for legal professionals, helping them manage organizations and serve clients. They are recognized as a fast-growing and innovative technology company with a team of passionate professionals.