Similar Jobs
See allSenior Site Reliability Engineer
Akuity
US
Kubernetes
AWS
GitOps
Staff Site Reliability Engineer
SmarterDx
US
Terraform
Kubernetes
AWS
Site Reliability Engineer
Ivanti
US
Python
Java
Bash
Site Reliability Engineering Manager II
Flywire
US
SRE
Software Engineering
Cloud Infrastructure
Senior Platform Engineer
Lillio
Canada
AWS
Terraform
Ruby On Rails
Role Overview:
- Improve the reliability, resilience, and operational readiness of our services.
- Work closely with engineering teams to improve system design and operational excellence.
- Help prevent incidents, lead response efforts, and drive improvements through post-mortems.
Responsibilities:
- Define and maintain SLIs, SLOs, SLA, and error budgets to guide reliability decisions.
- Improve monitoring, alerting, dashboards, tracing and runbooks for critical services.
- Make production issues easier to detect, troubleshoot, and resolve.
Who you are:
- You have experience designing and operating scalable, reliable systems in AWS or a similar cloud environment.
- You have handled on-call shifts for critical systems.
- You are able to dive in and debug live production systems.
Newton
Newton is changing how Canadians trade crypto with the goal to make financial freedom achievable for everyone by giving their customers the tools and knowledge needed to navigate the crypto world. They are a remote team spread across Canada that values pushing boundaries and getting things done.