Similar Jobs

See all

Site Reliability Engineer

Mistral AI

Europe

Docker Kubernetes Terraform

Site Reliability Engineer, Production Reliability

Yelp

Canada

Linux Python Kubernetes

Senior Site Reliability Engineer

Akuity

US

Kubernetes AWS GitOps

Key Responsibilities:

Support the availability and durability of critical services across production environments.
Monitor service health using SLIs, SLOs, and error budgets, and escalate issues when thresholds are at risk.
Participate in on-call rotations, incident response, and post-incident reviews to drive service improvements.

Automation & Tooling:

Develop automation for common operational tasks, reducing manual intervention and toil.
Contribute to monitoring, logging, and alerting frameworks (e.g., Prometheus, Grafana, Catchpoint,ELK).
Work with CI/CD pipelines, configuration management, and infrastructure as code tools (Terraform, Ansible, Jenkins).

Continuous Improvement:

Contribute to playbooks, runbooks, and operational documentation.
Identify recurring issues and propose long-term improvements.
Promote reliability-focused practices within development and operations teams.

Backblaze

Backblaze is the object storage leader in the open cloud movement, fueling customer success with cloud storage built purposefully to unlock budgets and unleash innovators. Founded in 2007, they scaled the business with less than $3 million in outside funding until 2021, and generate over $100m in revenue managing over three billion gigabytes of data storage for 500K+ customers in 175+ countries.

Apply for This Position