The Opportunity:
- The Managed Services team operates shared, production-critical infrastructure that powers Grafana Cloud’s next-generation database products (Mimir, Loki, and Tempo).
- This includes operating 100+ WarpStream clusters across multiple cloud providers and regions.
- The team works closely with high-volume analytical and storage systems.
What You’ll Be Doing:
- As a Senior Engineer, you will take ownership of running systems in production.
- You invest heavily in developer productivity and use modern AI coding assistants.
- There is an on-call component to this role, and you will work closely with counterparts in other regions.
What Makes You a Great Fit:
- Define SLOs, reduce error budgets, and improve the diagnosability of core streaming and database systems.
- Implement solutions that ensure reliability, scalability, and performance of infrastructure.
- Participate in PR review and contribute to design documents, automation, tooling, and code improvements.
Requirements:
- 6+ years of engineering experience in SRE, platform engineering, production engineering, infrastructure engineering, or distributed systems roles.
- Strong Kubernetes experience in AWS, GCP, or Azure.
- Clear communicator who can collaborate across teams and work autonomously.