Job Description
Support a global enterprise’s infrastructure operations and drive reliability across complex, hybrid environments. This role blends technical depth with operational rigor and collaboration across business and IT to ensure resilient, always-on service delivery. The Service Level & Availability Manager will own end-to-end Service Level and Availability Management across on-premises, cloud, and third-party systems. They will develop and maintain availability plans that align with business priorities and risk mitigation strategies. The role includes monitoring system health and performance using tools such as PowerBI, Datadog, Splunk, PagerDuty, and ServiceNow. Partnering with Infrastructure, DevOps, and Application teams to embed availability practices into change, incident, and release processes is essential. Defining and tracking key metrics (uptime, reliability, MTTR/MTTI) and presenting trends and recommendations to technical and leadership teams is expected, as well as fostering a culture of proactive monitoring, resilience engineering, and continuous improvement.