As the SRE Manager at Shippo, you will lead a team of engineers responsible for building platforms, tooling, and infrastructure that enable product teams to operate reliable, performant, and scalable services. You will establish frameworks for observability, deployment automation, and infrastructure management that allow product teams to own their service reliability. You will maintain a strong support oriented team while building automation and enabling engineering productivity and operational excellence across the organization.
Responsibilities include leading and developing a team of platform-focused SRE engineers, building and maintaining internal platforms and tooling, managing observability platforms, owning the infrastructure and Kubernetes platform, establishing frameworks and tooling for SLO/SLI definition, designing and maintaining CI/CD pipelines, building infrastructure-as-code foundations, creating automation to eliminate toil, driving infrastructure cost optimization initiatives, participating in leadership rotation for Sev1 incidents, managing the SRE teamβs on-call rotation, designing, implementing, and testing disaster recovery capabilities, partnering with Engineering Managers and TPMs, and establishing platform SLOs for infrastructure reliability.