Lead and expand a high-performing, distributed Production SRE team, contributing directly to service provider network automation and infrastructure reliability efforts that help us scale safely. Drive execution with ownership and accountability for bare metal and network reliability while championing a customer-first mindset and positioning your team as the central source of situational awareness.
Implement network automation to replace manual configurations and reduce operational toil. Support incident management, including on-call rotations and postmortem processes in partnership with other engineering leaders to ensure seamless operations through the development of collaborative reliability initiatives. Foster data-driven decision making and measurable outcomes for infrastructure components to mature our SRE practices and governance as we scale. Lead the cultural and technical transformation from traditional operations to modern SRE practices.