We’re hiring a Staff Software Engineer, Site Reliability to lead reliability across our production platform. As a Staff‑level Individual contributor, you will drive strategy and hands‑on execution across incident response, SLO/SLI programs, and production readiness, directly owning highly available services in AWS; all while partnering with Platform/Infra to build paved‑road tooling in our monorepo.
Responsibilities include owning the company‑wide incident lifecycle, defining and driving SLIs/SLOs for core services, leading production readiness reviews, embedding security into the delivery pipeline, and building and evolving paved roads for deploys, config, and runtime operations in our monorepo (Bazel) and CI/CD (AWS CodePipeline/CodeBuild).