Similar Jobs
See allSite Reliability Engineer
Newton
Canada
Python
Javascript
Java
Site Reliability Engineer II
Backblaze
LATAM
Linux
Python
Bash
Sr. Site Reliability Engineer
Coupa
US
Linux
Windows
PowerShell
Site Reliability Engineer
Granicus
Global
Linux
Unix
AWS
Site Reliability Engineer, Production Reliability
Yelp
Canada
Linux
Python
Kubernetes
Incident Response & Operations:
- Act as a primary or escalation responder
- Lead or support Major Incident response
- Drive blameless post-incident reviews
Monitoring, Alerting & Observability:
- Own service health monitoring across infrastructure
- Design and maintain alerting strategies
- Build dashboards using tools such as: Grafana Prometheus Datadog / Splunk / CloudWatch
Reliability Engineering & Automation:
- Automate repetitive operational tasks
- Improve mean time to detect (MTTD) and mean time to resolve (MTTR)
- Implement self‑healing and auto‑remediation where possible
Platform & Infrastructure Support:
- Support and troubleshoot Linux‑based systems
- Assist with capacity planning and availability reviews
- Ensure operational readiness for production releases
NiCE
NiCE Ltd. software products are used by 25,000+ global businesses, including 85 of the Fortune 100 corporations, to deliver extraordinary customer experiences, fight financial crime and ensure public safety. NiCE is consistently recognized as the market leader in its domains, with over 8,500 employees across 30+ countries and recognized as an innovation powerhouse that excels in AI, cloud and digital.