Similar Jobs
See allSite Reliability Engineer
Newton
Canada
Python
Javascript
Java
Site Reliability Engineer
Granicus
Global
Linux
Unix
AWS
Site Reliability Engineer
Mistral AI
Europe
Docker
Kubernetes
Terraform
Site Reliability Engineer, Production Reliability
Yelp
Canada
Linux
Python
Kubernetes
Senior Site Reliability Engineer
Akuity
US
Kubernetes
AWS
GitOps
Key Responsibilities:
- Support the availability and durability of critical services across production environments.
- Monitor service health using SLIs, SLOs, and error budgets, and escalate issues when thresholds are at risk.
- Participate in on-call rotations, incident response, and post-incident reviews to drive service improvements.
Automation & Tooling:
- Develop automation for common operational tasks, reducing manual intervention and toil.
- Contribute to monitoring, logging, and alerting frameworks (e.g., Prometheus, Grafana, Catchpoint,ELK).
- Work with CI/CD pipelines, configuration management, and infrastructure as code tools (Terraform, Ansible, Jenkins).
Continuous Improvement:
- Contribute to playbooks, runbooks, and operational documentation.
- Identify recurring issues and propose long-term improvements.
- Promote reliability-focused practices within development and operations teams.
Backblaze
Backblaze is the object storage leader in the open cloud movement, fueling customer success with cloud storage built purposefully to unlock budgets and unleash innovators. Founded in 2007, they scaled the business with less than $3 million in outside funding until 2021, and generate over $100m in revenue managing over three billion gigabytes of data storage for 500K+ customers in 175+ countries.