Similar Jobs
See allSite Reliability Engineer
Newton
Canada
Python
Javascript
Java
Site Reliability Engineer
Ditto
Datadog
Prometheus
Grafana
Senior Site Reliability Engineer
Calendly
GCP
Golang
Python
Senior Site Reliability Engineer
Calendly
US
GCP
Golang
Python
Senior Site Reliability Engineer
Kraken
Americas
Terraform
SQL
NoSQL
Responsibilities:
- Lead efforts to improve system reliability, scalability, and performance across critical services
- Define and implement SLIs/SLOs and error budgets, and use them to guide engineering priorities
- Design and develop observability systems (metrics, logging, tracing, alerting) that produce actionable alerts and data with minimal noise
Qualifications:
- 6 - 10+ years of experience in SRE, infrastructure, or backend systems engineering
- Demonstrated experience of owning reliability outcomes for complex, distributed systems
- Strong experience with cloud infrastructure (AWS, GCP, or Azure) and production-scale systems
Success Criteria:
- Critical services have clear, meaningful SLOs that drive engineering decisions
- Alerts are actionable; irrelevant alerts are reduced; on-call workload is manageable
- Incidents are handled efficiently, and repeat issues decline over time
UJET
UJET is an AI-powered contact center innovation company, delivering a cloud platform that redefines the customer experience. They are built on a cloud-native architecture and partner with businesses to deliver exceptional interactions and accelerated growth in the AI-driven world.