As a Site Reliability Engineer , you will help us design, automate, and scale secure‑by‑default cloud infrastructure so uptime stays exciting and on‑call stays uneventful. We are seeking an experienced SRE to join our team to develop and maintain cloud-based infrastructure. You will be responsible for designing, building, and scaling robust infrastructure, including observability, metrics and alerting. You will also ensure our work is sustainable by promoting best practices around deployment, incident management and disaster recovery.
Practice continuous improvement, by iterating on how services are deployed, configured, monitored, and maintained on our platform. You will also lead incident response, diagnosis, and follow-up on system outages and alerts. You will help develop an operational focus and act as thought leaders for the rest of engineering. Maintain and optimize infrastructure for performance, scalability, and cost. Analyze system metrics and identify opportunities for improvement in reliability and efficiency.