Jobs Similar to Senior Site Reliability Engineer | TangerineFeed

Senior Site Reliability Engineer

EarnIn 23 days ago

Mexico

Design systems with resilience, graceful degradation, and capacity in mind.
Define and measure SLOs and SLIs that actually reflect what our customers feel.
Use Datadog (logging, metrics, APM) together with CloudWatch to build signal-heavy, noise-light observability.

Python Go Datadog CloudWatch

20 jobs similar to Senior Site Reliability Engineer

Jobs ranked by similarity.

Senior Site Reliability Engineer

Playon 25 days ago

Unlimited PTO

Assess and improve visibility by identifying gaps in dashboards, metrics, and logs.
Refine alerts and dashboards for critical services to catch issues earlier.
Automate routine checks and monitoring tasks to free up engineers.

PlayOn is where high school sports come to life through platforms like GoFan, NFHS Network, and MaxPreps. As a growth-stage company backed by KKR, we build the technology that powers high school athletics from ticketing and streaming to fundraising and merchandise.

View details Similar jobs

Staff Site Reliability Engineer - Site Experience

Reddit 11 days ago

Europe

Lead Reliability Engineering for User Experience.
Architect for Scale, partnering with product and infrastructure teams to design highly available systems.
Drive Automation to eliminate repetitive operational work through tooling and systems.

Reddit is a community-based platform where users submit, vote, and comment on various topics. It hosts over 100,000 active communities and attracts millions of daily active users, making it one of the largest and most influential internet platforms.

View details Similar jobs

Staff Software Engineer - Grafana Cloud k6

Grafana Labs 23 days ago

Germany 6w PTO

Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, and their team thrives in an innovation-driven environment.

View details Similar jobs

Site Reliability Engineer

SupplyHouse.com 21 days ago

$29,000–$36,000/yr

India

Design, build, and maintain scalable, reliable systems on GCP.
Develop automation for infrastructure provisioning using Terraform, Ansible, or Deployment Manager.
Manage incident response, conduct postmortems, and implement improvements to reduce recurrence.

SupplyHouse.com is an industry-leading e-commerce company specializing in HVAC, plumbing, heating, and electrical supplies since 2004. They value every individual team member and cultivate a community where people come first with Generosity, Respect, Innovation, Teamwork, and GRIT.

View details Similar jobs

Staff Site Reliability Engineer I EMEA

Remote 19 days ago

$188,550–$212,150/yr

Global Unlimited PTO

Own the technical direction of Remote's SRE/Platform domain.
Define and drive the reliability strategy across the platform.
Identify and lead AI enablement initiatives across the engineering organisation.

Remote is solving modern organizations’ biggest challenge – navigating global employment compliantly with ease. With our core values at heart and a future-focused work culture, our team works tirelessly on ambitious problems, asynchronously, around the world.

View details Similar jobs

Site Reliability Engineer 3

Granicus 21 days ago

Global

Provide production support on a shift according to the team on-call roster.
Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support.
Continuously monitor the health and performance of our services, systems, and infrastructure.

Granicus builds and maintains technology that is transforming the Govtech industry by bringing governments and its constituents together. They serve 5,500 federal, state, and local government agencies and more than 300 million citizen subscribers, and are known for being one of the best companies to work for.

View details Similar jobs

Senior Site Reliability Engineer, Infrastructure Foundations

Wikimedia Foundation 23 days ago

$113,082–$175,725/yr

US Global

Performing day-to-day operational/DevOps tasks on Wikimedia’s public facing infrastructure.
Implementing and utilizing configuration management and deployment tools.
Leading continuous improvement, by automating the installation, configuration and maintenance of services on our platform.

The Wikimedia Foundation operates Wikipedia and other Wikimedia free knowledge projects with the vision of a world where every single human can freely share in the sum of all knowledge. As a charitable, not-for-profit organization, it relies on donations and has staff members based in 40+ countries.

View details Similar jobs

Senior Site Reliability Engineer

Loadsmart 15 days ago

Brazil Unlimited PTO

Collaborate with a tight-knit development team.
Design, deploy, and operate critical systems balancing reliability, cost, and agility.
Perform troubleshooting and root-cause analysis of system operation issues.

Loadsmart is a logistics technology company valued at over $1 billion. We are a collection of industry veterans and user-centered engineers using innovative technology to fearlessly reinvent the future of freight.

View details Similar jobs

Lead SRE/DevOps Engineer

Launch Potato 22 days ago

$160,000–$190,000/yr

US

Own and evolve Launch Potato's cloud infrastructure, CI/CD platform, and compliance posture.
Build the SRE function from the ground up so product teams can ship faster without compromising reliability, security, or cost control.
Stand up the SRE practice from scratch: on-call rotation, PagerDuty configuration, SLA/SLO definitions for core infrastructure services, runbook library, and observability dashboards that tie site performance to business metrics.

Launch Potato is a digital media company that connects consumers with leading brands through data-driven content and technology. They are headquartered in South Florida with a remote-first team spanning over 15 countries, with a high-growth, high-performance culture.

View details Similar jobs

Staff Platform Engineer

Topstep 27 days ago

$205,000–$235,000/yr

US

Provide technical leadership for infrastructure, reliability, and observability.
Own the observability stack using Datadog and CloudWatch.
Design and evolve AWS infrastructure for reliability, security, scalability, and cost efficiency.

Topstep is an engaging working environment that ranges from fully remote to hybrid. They foster a culture of collaboration by keeping cameras on during meetings and maintaining a robust Slack environment for communication.

View details Similar jobs

Senior Software Engineer, Infrastructure

Epic 25 days ago

$160,000–$200,000/yr

US

Drive the stability and reliability of Epic's GCP infrastructure.
Manage and harden our Docker and GKE container platform.
Maintain and improve CI/CD pipelines.

Epic is the leading digital reading platform for kids ages 12 and under, used by millions of children, families, and educators around the world. As Epic continues to grow, we are reimagining what reading can be through thoughtful technology, data, and global collaboration to make learning more engaging, accessible, and impactful.

View details Similar jobs

Software Engineer - Grafana Cloud Integrations

Grafana Labs 19 days ago

$55,778–$69,695/yr

Europe 6w PTO

Develop and maintain features as part of Observability solutions in Grafana Cloud.
Contribute to the design and implementation of high-quality, scalable integrations for various infrastructure components, databases, and applications
Build prototypes and present your ideas as part of a cross-functional team

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, and thrive in an innovation-driven environment with a global collaborative culture.

View details Similar jobs

Senior Site Reliability Engineer

Redcare Pharmacy 8 days ago

Germany

Build and maintain end-to-end observability with ELK, Prometheus, and Grafana.
Own and improve CI/CD pipelines (CircleCI, GitLab CI, GitHub Actions, ArgoCD).
Lead incident response and postmortems in a blameless culture.

Redcare Pharmacy is Europe’s No.1 e-pharmacy, powered by passionate teams and cutting-edge innovation. They strive to create a healthy, collaborative work environment where every employee feels valued and inspired to contribute to their vision “Until every human has their health”.

View details Similar jobs

Senior Backend Engineer - Application Core Services, Stacks

Grafana Labs 23 days ago

$154,445–$185,334/yr

US 6w PTO

Design, build, and operate reconciliation systems to track desired stack state, detect and repair drift across stack templates, grafana.com state, Hosted Grafana, and actual customer stack configuration.
Collaborate across SSS, grafana.com, and deployment configurations to ensure stack lifecycle workflows remain reliable, observable, and resilient.
Improve operational efficiency by reducing deployment complexity and contributing to the Stack Config Reconciliation project.

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack. Their team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything they do.

View details Similar jobs

Backend Engineer - Platform - Stacks

Grafana Labs 17 days ago

Europe 6w PTO

Design, build, and operate reconciliation systems to track desired stack state, detect and repair drift across stack templates, grafana.com state, Hosted Grafana, and actual customer stack configuration.
Collaborate across SSS, grafana.com, and deployment configurations to ensure stack lifecycle workflows remain reliable, observable, and resilient.
Improve operational efficiency by reducing deployment complexity and contributing to the Stack Config Reconciliation project.

Grafana Labs is a remote-first, open-source powerhouse with over 20M users of Grafana. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, featuring scalable metrics (Grafana Mimir), logs (Grafana Loki), and traces (Grafana Tempo).

View details Similar jobs

Infrastructure Engineer (Observability)

Lightning AI 19 days ago

$180,000–$200,000/yr

US

Own and evolve a scalable observability platform spanning metrics, logs, traces, and events.
Design telemetry pipelines ingesting data from GPUs, CPUs, networking, containers, APIs, and BMC/Redfish.
Design and implement noise-resistant alerting systems to improve signal quality and reduce operational load.

Lightning AI builds an end-to-end platform for developing, training, and deploying AI systems, designed to take ideas from research to production with less friction. They combine developer-first software with cost-efficient, large-scale compute, serving solo researchers, startups, and large enterprises.

View details Similar jobs

Site Reliability Engineer II

Openly 10 days ago

$115,200–$172,800/yr

US 8w paternity

Build internal tooling to help other engineers and the rest of the company understand and operate our system.
Design and implement security best practices for our team and infrastructure.
Reduce toil through automation, including building and maintaining CI/CD infrastructure.

Openly is rebuilding insurance from the ground up by re-envisioning and enhancing every aspect of the customer experience. They are a rapidly growing team of exceptional, curious, empathetic people with a wide range of skill sets, spanning many departments.

View details Similar jobs

SRE

Fal 12 days ago

$180,000–$250,000/yr

US

Own and operate our Kubernetes infrastructure.
Build and maintain CI/CD pipelines and deployment infrastructure.
Leverage AI to automate analysis and resolution of production issues.

Fal is the generative media ecosystem powering the next generation of AI products. They build the infrastructure, tools, and model access that teams need to move from idea to production.

View details Similar jobs

Engineering Leader, Infrastructure Platform

Horizon3.ai 26 days ago

$240,000–$290,000/yr

US Unlimited PTO

Lead software engineering teams providing infrastructure-as-code to manage cloud infrastructure.
Hire experienced site reliability staff, and a line manager to grow and oversee the SRE team.
Establish design-before-build discipline; facilitate lightweight design documents, architectural decision records, and working group reviews.

Horizon3.ai is a cybersecurity company dedicated to enabling organizations to proactively find, fix, and verify exploitable attack vectors. They are a fast-growing company with a culture of respect, collaboration, ownership, and results.

View details Similar jobs

Senior Cloud Infrastructure Engineer

Dragos 15 days ago

$165,000–$165,000/yr

US

Design, build, and maintain scalable cloud infrastructure services in AWS and GCP.
Contribute production-quality Go and Python code to existing cloud services.
Develop and own automation and software deployment pipelines with maximum efficiency.

Dragos is dedicated to arming customers with best-in-class technology, threat intelligence, and services to protect their systems. They embody core values of authenticity, transparency, and trust and are a remote-first culture with operations in North America, Europe, the Middle East, and APAC.

View details Similar jobs