Source Job

20 jobs similar to Staff Software Engineer - Grafana Cloud Observability, Kubernetes Monitoring

Jobs ranked by similarity.

Europe 6w PTO

  • Design and implement high-quality, scalable integrations for various infrastructure components, applications, and data ingestion pipelines.
  • Create middleware components and libraries that simplify development and maintenance of observability solutions.
  • Lead the technical direction and vision of the team, contributing to strategic discussions and future development of observability solutions.

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, featuring scalable metrics, logs, and traces, and thrive in an innovation-driven environment.

Europe 6w PTO

  • Develop and maintain features as part of Observability solutions in Grafana Cloud.
  • Contribute to the design and implementation of high-quality, scalable integrations for various infrastructure components, databases, and applications
  • Build prototypes and present your ideas as part of a cross-functional team

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana. It helps more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack.

Europe 6w PTO

  • Take an active role in influencing our roadmap and your own career objectives
  • Help your team drive projects from initial idea all the way to operations once it is in the hands of customers
  • Embrace our open-source culture and contribute to other projects that may not directly fall within your team’s scope

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana, the open source visualization tool, around the globe. Grafana Labs also helps more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack. Our team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything we do.

Canada 6w PTO

  • Provide and own automation of the provisioning of CSP resources, including networking, Kubernetes clusters and specific CSP resources required by our application teams.
  • Work with users (Grafana Cloud application teams) to help understand their needs and ensure investment in the right capabilities.
  • Participate in the Platform department Infrastructure wing on-call rotation.

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana around the globe. The team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything that they do.

US 6w PTO

  • Operate and evolve multi-cloud streaming clusters and related database infrastructure, diagnosing and eliminating cross-layer failure modes.
  • Design safe upgrade and rollout strategies at scale, improving observability, automation, and operational ergonomics.
  • Partner closely with database and platform teams to ensure safe scaling, partitioning, consumer fan-out, and query performance.

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, which can be run fully managed with Grafana Cloud or self-managed with the Grafana Enterprise Stack.

Europe 6w PTO

  • Drive technical strategy and roadmap.
  • Lead end-to-end delivery of large, cross-functional projects.
  • Own architecture, reliability, performance and cost for critical systems.

Grafana Labs provides an open source observability platform that integrates metrics, logs, traces, and profiles with Grafana. They have a global collaborative culture, and passion for meaningful work. Their team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything they do.

Europe 6w PTO

  • Take an active role in influencing our roadmap and your own career objectives
  • Work with your team to deliver new features, then use the results to iterate and improve.
  • Drive projects from initial idea all the way to operations once it is in the hands of customers

Grafana Labs is a remote-first, open-source powerhouse with over 20M Grafana users globally. With a global collaborative culture, Grafana Labs fosters transparency, autonomy, and trust in an innovation-driven environment.

Europe 6w PTO

  • Partner closely with product engineering squads (embedded model)
  • Own production reliability for high-SLA and complex customer environments
  • Design and implement automation to scale our reliability practices

Grafana Labs is a remote-first, open-source powerhouse that helps more than 3,000 companies manage their observability strategies. They are scaling fast and staying true to what makes them different: an open-source legacy, a global collaborative culture, and a passion for meaningful work.

$120,000–$140,000/yr
US Unlimited PTO

  • Architect and manage scalable cloud infrastructure within AWS.
  • Implement and maintain infrastructure using Terraform.
  • Develop automation scripts to improve operational efficiency.

Attune empowers insurance agents with their technology solutions. We foster a remote-first culture and value employee development.

Europe

  • Developing infrastructure to support cloud-based applications.
  • Creating deployment architect and continuous delivery pipelines.
  • Designing high-availability approaches, and implementing monitoring architecture.

Nearform is a digital and AI engineering consultancy with a reputation for experience-led modernization. They focus on creating transformative digital products for enterprise customers across the UK and Ireland. Nearformers form a close-knit community built on trust and camaraderie.

US 6w PTO

  • Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
  • Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
  • Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana, the open source visualization tool, around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack.

$205,000–$270,000/yr
US Unlimited PTO

  • Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.
  • Ensure reliability of multi-cloud Kubernetes clusters and pipelines.
  • Focus on automation so we can spend energy where it matters.

Cresta is on a mission to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Their platform combines the best of AI and human intelligence to help contact centers discover customer insights and behavioral best practices.

Nigeria

  • Detect and triage service and reliability issues.
  • Develop automation to eliminate manual and repetitive operational tasks.
  • Investigate and resolve customer complaints escalated beyond L1 and L2 support.

Moniepoint is an all-in-one financial services platform for emerging markets. Since 2019, Moniepoint’s technology has powered over 3 million people, offering personal and business banking, payment, credit and business management tools to help them succeed.

$141,000–$230,000/yr
US

  • Collaborate with engineering teams to design and implement scalable, secure systems.
  • Establish and manage service level objectives (SLOs) and service level agreements (SLAs).
  • Enhance incident response processes and post-mortem analysis for outages.

ClickHouse, recognized on the 2025 Forbes Cloud 100 list, is one of the most innovative and fast-growing private cloud companies. With more than 3,000 customers and ARR that has grown over 250 percent year over year, ClickHouse leads the market in real-time analytics, data warehousing, observability, and AI workloads.

  • Architect observability platform: Design, implement, and maintain the LGTM stack as the primary observability platform across all engineering teams.
  • Build internal observability products: Design and develop production-grade internal platform products with React/TypeScript frontends and Python/Rust backends.
  • Develop custom log indexing systems: Architect and build high-performance log indexing solutions using Rust that process logs and provide sub-second search across billions of log lines.

Judi Health is an enterprise health technology company providing a comprehensive suite of solutions for employers and health plans. They have a mission of rebuilding trust in healthcare in the U.S. and deploying the infrastructure we need for the care we deserve.

$200,000–$285,000/yr
Global

  • Manage and grow a team of engineers, conducting performance reviews and providing coaching.
  • Define and execute the technical vision for the observability platform.
  • Provide architectural oversight on instrumentation, logging, metrics, and tracing.

Jobgether uses an AI-powered matching process to ensure candidate applications are reviewed quickly, objectively, and fairly against a role's core requirements. They identify the top-fitting candidates and share this shortlist directly with the hiring company.

$127,651–$153,180/yr
US 6w PTO

  • Own the automation and decision making around our underlying CSP and compute security features for Grafana Cloud.
  • Aim to simplify secure decision making for engineers by providing pre-configured, secure templates and embedding security best practices into our development tools.
  • Help the rest of our engineers contributing to Grafana Cloud make the best security decisions possible for all the products we build, through security reviews and advisory.

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, which can be run fully managed with Grafana Cloud or self-managed with the Grafana Enterprise Stack.

US Unlimited PTO

  • Design, build, and maintain scalable infrastructure and tooling that improves reliability, performance, and availability across OnePay’s platform
  • Contribute to the evolution of our observability stack, platform libraries, cloud architecture, and CI/CD pipelines
  • Develop automation and monitoring systems to detect, prevent, and remediate incidents before they impact customers

OnePay is a consumer fintech company trusted by millions of Americans to make money better, providing an all-in-one financial services platform. Backed by Walmart and Ribbit Capital, OnePay provides banking, savings, credit cards, lending, investing, and crypto services and embedded financial services to frontline workers.

$160,000–$180,000/yr
US

  • Responsible for availability, latency, performance, efficiency, monitoring/observability, emergency response, capacity planning.
  • Analyze, troubleshoot and resolve operational challenges contributing to defined SLO's.
  • Manage site stability, performance, reliability, and maintain uptime for production environments.

CentralReach provides autism and IDD care software for Applied Behavior Analysis (ABA), multidisciplinary therapy, and special education. They are trusted by more than 200,000 users and is backed by Roper Technologies, Inc. (Nasdaq: ROP). Their culture is centered around impact, inclusion, and flexibility.

$150,000–$188,000/yr
US 5w maternity

  • Support teammates with goal-setting, professional development, and mentoring.
  • Ensure delivery of maintainable, high-quality platform systems.
  • Build and sustain a healthy team culture where ownership and collaboration are the norm.

onX is a pioneer in digital outdoor navigation solutions through its suite of apps. With over 400 employees, they foster a fast-paced, tech-forward environment valuing ownership, accountability, and teamwork.