Source Job

United States 6w PTO

  • Build and operate the internal engineering platform that provides application engineers with the tools, systems, and Kubernetes clusters they need to deploy and run their workloads.
  • Focus on cloud infrastructure, capacity management, security, engineering productivity, monitoring, and US Federal compliance across squads.
  • Participate in on-call rotations to ensure the health of the system and understand how people use our products.

Go Python Kubernetes Distributed Systems

20 jobs similar to Staff Software Engineer - Platform, SysEng

Jobs ranked by similarity.

Europe 6w PTO

  • Design, build, and operate reconciliation systems to track desired stack state, detect and repair drift across stack templates, grafana.com state, Hosted Grafana, and actual customer stack configuration.
  • Collaborate across SSS, grafana.com, and deployment configurations to ensure stack lifecycle workflows remain reliable, observable, and resilient.
  • Improve operational efficiency by reducing deployment complexity and contributing to the Stack Config Reconciliation project.

Grafana Labs is a remote-first, open-source powerhouse with over 20M users of Grafana. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, featuring scalable metrics (Grafana Mimir), logs (Grafana Loki), and traces (Grafana Tempo).

US Canada 6w PTO

  • Earning the trust of our large-scale operator customers to further Grafana's "big tent" philosophy of data accessibility and to meet clear business objectives.
  • Designing and leading the development of backend services, distributed systems, and enterprise features at scale.
  • Driving continuous improvement of our engineering culture through words and actions.

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana, the open source visualization tool, around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, which can be run fully managed with Grafana Cloud or self-managed with the Grafana Enterprise Stack. The Grafana team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything they do.

US 6w PTO

  • Design, build, and operate reconciliation systems to track desired stack state, detect and repair drift across stack templates, grafana.com state, Hosted Grafana, and actual customer stack configuration.
  • Collaborate across SSS, grafana.com, and deployment configurations to ensure stack lifecycle workflows remain reliable, observable, and resilient.
  • Improve operational efficiency by reducing deployment complexity and contributing to the Stack Config Reconciliation project.

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack. Their team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything they do.

Germany 6w PTO

  • Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
  • Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
  • Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, and their team thrives in an innovation-driven environment.

Spain 6w PTO

  • Take an active role in influencing our roadmap and your own career objectives.
  • Design, build, operate, and maintain critical systems, owning the reliability, performance, and availability.
  • Mentor and support other team members, participate in design discussions and collaborate with the team.

Grafana Labs is a remote-first, open-source powerhouse that provides visualization tools. They help companies manage their observability strategies. Grafana Labs has a global collaborative culture, and a passion for meaningful work.

Canada Unlimited PTO

  • Design, build, and operate distributed systems powering observability across ClickHouse Cloud.
  • Own reliability, performance, and cost-efficiency of the telemetry pipeline and storage systems.
  • Take part in on-call rotation and drive root-cause resolution and long-term fixes.

ClickHouse is a real-time analytics and data warehousing company recognized on the 2025 Forbes Cloud 100 list. With over 3,000 customers and rapid growth, the company fosters an innovative and fast-paced culture.

$124,845–$146,205/yr
Europe 6w PTO

  • Manage and grow a distributed team of engineers, providing feedback and supporting career development.
  • Partner with product management to shape the Usage squad's roadmap, ensuring alignment with company mission and customer impact.
  • Guide the team through the full project lifecycle, ensuring high-quality and timely outcomes within the Usage domain.

Grafana Labs is a remote-first, open-source powerhouse with over 20M users globally. Their team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything they do.

Europe 6w PTO

  • Develop and maintain features as part of Observability solutions in Grafana Cloud.
  • Contribute to the design and implementation of high-quality, scalable integrations for various infrastructure components, databases, and applications
  • Build prototypes and present your ideas as part of a cross-functional team

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, and thrive in an innovation-driven environment with a global collaborative culture.

$116,449–$139,531/yr
Europe 6w PTO

  • Take an active role in influencing our roadmap and your own career objectives.
  • Drive projects from initial ideation all the way to operations once it is in the hands of customers.
  • Design, build, operate, and maintain critical systems, owning the reliability, performance, and availability.

Grafana Labs is behind the open observability cloud, and is founded on the principles of open source, open standards, open ecosystems, and open culture. They are a 100% remote company with 1,600+ team members across 40+ countries.

UK 6w PTO

  • Act as a trusted technical partner, guiding organizations through onboarding, implementation, and expansion with white-glove support and best practices.
  • Deliver high-impact training, jumpstart engagements, and provide tailored technical consulting to help customers succeed.
  • Identify recurring issues, monitor support needs, and advocate for product improvements in close collaboration with internal teams.

Grafana Labs is the company behind Grafana, the open observability platform. With over 1,600 team members across 40+ countries, we are a 100% remote company backed by leading investors and trusted by more than 35 million users and 7,000+ customers.

Germany Spain Ireland UK 6w PTO

  • Lead a team covering corporate systems, employee device lifecycles, helpdesk queues, and internal tooling development.
  • Handle corporate security initiatives and compliance checks in an employee-enabling way.
  • Help define the wider internal AI rollout, enablement, and security strategy across teams.

Grafana Labs is a remote-first, open-source powerhouse, with over 20M users of Grafana. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack and thrive in an innovation-driven environment, scaling fast and staying true to its open-source legacy and a global collaborative culture.

Global 16w maternity 16w paternity

  • Lead the design and implementation of self-service platform infrastructure for provisioning, deployment, and observability across engineering teams.
  • Evolve multi-tenant EKS foundations toward better reliability, security, scale, and multi-region connectivity.
  • Set delivery standards using Terraform, GitOps, and progressive rollout, while improving SLOs and alerting on Grafana Cloud.

Docker is a developer tooling company trusted by over 20 million monthly users and 20 billion container image pulls. They are a globally distributed, remote-first team building tools that define how software gets built and delivered.

$180,000–$200,000/yr
US

  • Own and evolve a scalable observability platform spanning metrics, logs, traces, and events.
  • Design telemetry pipelines ingesting data from GPUs, CPUs, networking, containers, APIs, and BMC/Redfish.
  • Design and implement noise-resistant alerting systems to improve signal quality and reduce operational load.

Lightning AI builds an end-to-end platform for developing, training, and deploying AI systems, designed to take ideas from research to production with less friction. They combine developer-first software with cost-efficient, large-scale compute, serving solo researchers, startups, and large enterprises.

EMEA

  • Design and build large-scale distributed systems and high-throughput data pipelines using Go and cloud-native technologies.
  • Lead system-wide architectural decisions focusing on data flow, performance, and resilience.
  • Champion best engineering practices, code quality, testing, and maintainability while mentoring junior engineers.

DoiT is a global technology company that helps organizations leverage the cloud for business growth, combining data, technology, and human expertise. With thousands of customers worldwide, DoiT fosters a remote-first culture that values entrepreneurship, knowledge pursuit, and fun.

$188,550–$212,150/yr
Global Unlimited PTO

  • Own the technical direction of Remote's SRE/Platform domain.
  • Define and drive the reliability strategy across the platform.
  • Identify and lead AI enablement initiatives across the engineering organisation.

Remote is solving modern organizations’ biggest challenge – navigating global employment compliantly with ease. With our core values at heart and a future-focused work culture, our team works tirelessly on ambitious problems, asynchronously, around the world.

Europe

  • Join the security team to build world-class security into products, focusing on operations, monitoring, and incident response.
  • Proactively improve security across codebase, product, cloud, and customer deployments.
  • Work as a generalist covering all facets of security, from application testing to threat modeling.

Sourcegraph builds the world's most powerful code intelligence platform, helping developers and agents navigate complex codebases. They are a globally distributed team backed by a16z, Sequoia, and Redpoint, with a culture of high agency and direct communication.

Europe

  • Design and operate our Kubernetes ecosystem with a focus on high availability and zero-downtime operations.
  • Own and evolve our PaaS strategy, using GitOps and CI/CD to empower domain teams to deploy independently.
  • Define and implement our observability strategy across metrics, logs, and tracing.

Finom is a European tech startup headquartered in Amsterdam, revolutionizing financial services for entrepreneurs. They offer an all-in-one financial B2B solution integrating banking, accounting, financial management, and invoicing into a mobile-first platform, with about 346 million in funding.

$160,000–$200,000/yr
US

  • Drive the stability and reliability of Epic's GCP infrastructure.
  • Manage and harden our Docker and GKE container platform.
  • Maintain and improve CI/CD pipelines.

Epic is the leading digital reading platform for kids ages 12 and under, used by millions of children, families, and educators around the world. As Epic continues to grow, we are reimagining what reading can be through thoughtful technology, data, and global collaboration to make learning more engaging, accessible, and impactful.

$210,000–$278,000/yr
US Unlimited PTO

  • Architect future iterations of core systems, addressing scaling requirements.
  • Design and implement developer tools to enhance deployment safety and reproducibility.
  • Drive excellence in monitoring and guide incident response for quick issue resolution.

Found provides tools for self-employed individuals, offering a business bank account that automates taxes and expense tracking. They aim to give self-employed people the security and peace of mind historically available only at large corporations and are looking for kind, resourceful, and passionate people.

$280,000–$315,000/yr
US Unlimited PTO

  • Lead the strategy, design, and development of the core pentest execution.
  • Manage engineering teams responsible for building and scaling the platform systems.
  • Partner with product teams to define platform requirements and deliver capabilities.

Horizon3.ai is a cybersecurity company dedicated to enabling organizations to proactively find, fix, and verify exploitable attack vectors. They are a team of former U.S. Special Operations cyber operators and startup engineers committed to solving security problems.