Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana, the open source visualization tool, around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack.
Build distributed systems that support reliability, resiliency, and safe operation at scale.
Design and operate traffic control mechanisms: circuit breakers, rate limiting, admission control, backpressure, and graceful degradation.
Develop tooling that improves incident detection, response, and automated mitigation.
Whatnot is the largest live shopping platform in North America and Europe to buy, sell, and discover the things you love. They are a remote co-located team, inspired by innovation and anchored in their values.
Collaborate with exceptional engineers on building systems and services for the world's largest companies.
Lead architecture for distributed services at scale that synchronize shared state across clients.
Drive cross-team technical alignment via design docs and decision records; unblock execution across org boundaries.
Webflow is building the world’s leading AI-native Digital Experience Platform. They are a remote-first company built on trust, transparency, and creativity, empowering teams to design, launch, and optimize for the web without barriers.
Partner closely with product engineering squads (embedded model)
Own production reliability for high-SLA and complex customer environments
Design and implement automation to scale our reliability practices
Grafana Labs is a remote-first, open-source powerhouse that helps more than 3,000 companies manage their observability strategies. They are scaling fast and staying true to what makes them different: an open-source legacy, a global collaborative culture, and a passion for meaningful work.
Provide and own automation of the provisioning of CSP resources, including networking, Kubernetes clusters and specific CSP resources required by our application teams.
Work with users (Grafana Cloud application teams) to help understand their needs and ensure investment in the right capabilities.
Participate in the Platform department Infrastructure wing on-call rotation.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana around the globe. The team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything that they do.
Helping internal engineers release software securely and measurably.
Leading automation of release processes using ‘golden path’ techniques.
Supporting diverse internal teams from application development to security.
Grafana Labs is a remote-first, open-source powerhouse with over 20M users globally. It helps more than 3,000 companies manage their observability strategies, and their team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything.
Develop automation to eliminate manual and repetitive operational tasks.
Investigate and resolve customer complaints escalated beyond L1 and L2 support.
Moniepoint is an all-in-one financial services platform for emerging markets. Since 2019, Moniepoint’s technology has powered over 3 million people, offering personal and business banking, payment, credit and business management tools to help them succeed.
Develop automation code to provision and operate infrastructure at scale.
Build resilient, scalable, secure, and observable services with cost optimization.
Proactively identify and address security concerns across systems and infrastructure.
Globality uses AI to transform enterprise spending into a more efficient and inclusive process. They aim to revolutionize enterprise procurement with AI and have a culture built on trust, collaboration, and innovation, fostering an environment where every individual feels valued and included.
Design and develop the platform architecture to enable developers to self service in building out infrastructure.
Collaborate with development teams to ensure their applications are optimized for deployment in an IAC environment, and meet security and compliance needs.
Develop and maintain automation and deployment processes to enable and improve developer experience and efficiency.
Quanata aims to ensure a better world through context-based insurance solutions. They are a customer-centered team creating innovative technologies and digital products, backed by State Farm, blending Silicon Valley talent with long-term backing of a leading insurer.
Operate and evolve multi-cloud streaming clusters and related database infrastructure, diagnosing and eliminating cross-layer failure modes.
Design safe upgrade and rollout strategies at scale, improving observability, automation, and operational ergonomics.
Partner closely with database and platform teams to ensure safe scaling, partitioning, consumer fan-out, and query performance.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, which can be run fully managed with Grafana Cloud or self-managed with the Grafana Enterprise Stack.