This is a unique opportunity to join a specialized team focused on observability, system reliability and operational excellence for our cutting-edge, edge-to-cloud, database technology. As a Senior Site Reliability Engineer, you will: Develop and maintain observability solutions using platforms like Datadog, Prometheus and Grafana, take a leading role in incident management, including coordinating response efforts, troubleshooting issues, and identifying follow-up actions. Partner with product engineering teams to architect reliable systems, recover from incidents, and learn from mistakes. Work with teams to implement and maintain SLOs, monitoring, and alerting strategies that ensure reliability at scale. Design and implement automation and support tooling to improve system resilience, maintain operational safety and reduce operational overhead. Contribute to the development and maintenance of runbooks, alert definitions, and incident response procedures. Participate in on-call rotations to provide 24/7 support for critical production systems.