Build and operate the internal engineering platform that provides application engineers with the tools, systems, and Kubernetes clusters they need to deploy and run their workloads.
Focus on cloud infrastructure, capacity management, security, engineering productivity, monitoring, and US Federal compliance across squads.
Participate in on-call rotations to ensure the health of the system and understand how people use our products.
Grafana Labs, the company behind the open observability cloud, is founded on the principles of open source, open standards, open ecosystems, and open culture. We are a 100% remote company with 1,600+ team members across 40+ countries, backed by leading investors including Lightspeed Venture Partners, Sequoia Capital, GIC, Coatue, J.P. Morgan, CapitalG, and Lead Edge Capital.
Design, build, and operate distributed systems powering observability across ClickHouse Cloud.
Own reliability, performance, and cost-efficiency of the telemetry pipeline and storage systems.
Take part in on-call rotation and drive root-cause resolution and long-term fixes.
ClickHouse is a real-time analytics and data warehousing company recognized on the 2025 Forbes Cloud 100 list. With over 3,000 customers and rapid growth, the company fosters an innovative and fast-paced culture.
Earning the trust of our large-scale operator customers to further Grafana's "big tent" philosophy of data accessibility and to meet clear business objectives.
Designing and leading the development of backend services, distributed systems, and enterprise features at scale.
Driving continuous improvement of our engineering culture through words and actions.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana, the open source visualization tool, around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, which can be run fully managed with Grafana Cloud or self-managed with the Grafana Enterprise Stack. The Grafana team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything they do.
Own and operate customer-facing managed infrastructure across multiple AWS accounts and regions.
Serve as the senior technical escalation point for production incidents and complex configurations.
Contribute to OpenTelemetry distributions and maintain open source projects like Refinery.
Honeycomb provides observability for developer tools, helping companies like HelloFresh and Slack understand their software. They have over 200 employees and were named to Forbes' Best Startups in 2022 and 2023, with a culture that values inclusion and autonomy.
Anticipate and support the Solutions Engineering team by designing technical presentations, demos, and white papers.
Create and deliver training materials, product workshops, and webinars for internal teams and customers.
Partner with Product, Marketing, and Engineering to enable the field with deep technical expertise and strategic support.
Grafana Labs is the company behind the open-source observability platform, providing a fully managed cloud service for monitoring and analytics. With over 1,600 team members across 40+ countries, they foster a global collaborative culture rooted in open source, transparency, and autonomy.
Design and operate our Kubernetes ecosystem with a focus on high availability and zero-downtime operations.
Own and evolve our PaaS strategy, using GitOps and CI/CD to empower domain teams to deploy independently.
Define and implement our observability strategy across metrics, logs, and tracing.
Finom is a European tech startup headquartered in Amsterdam, revolutionizing financial services for entrepreneurs. They offer an all-in-one financial B2B solution integrating banking, accounting, financial management, and invoicing into a mobile-first platform, with about 346 million in funding.
Work with your team to deliver new functionality, then use results to iterate and improve.
Take an active role in influencing our roadmap and your career objectives.
Mentor and support other team members, participate in design discussions, and collaborate with the team.
Grafana Labs is the company behind the open observability cloud, Grafana Cloud, built on open-source principles. With over 1,600 team members across 40+ countries, we foster a global collaborative culture backed by leading investors.
Take an active role in influencing our roadmap and your own career objectives.
Drive projects from initial ideation all the way to operations once it is in the hands of customers.
Design, build, operate, and maintain critical systems, owning the reliability, performance, and availability.
Grafana Labs is behind the open observability cloud, and is founded on the principles of open source, open standards, open ecosystems, and open culture. They are a 100% remote company with 1,600+ team members across 40+ countries.
Co-own the architecture of cloud infrastructure on Azure and Kubernetes clusters for high throughput and availability.
Drive resilience strategy for global scaling, zero-downtime deployments, and disaster recovery.
Evolve observability stack with LGTM (Loki, Grafana, Tempo, Mimir) and lead incident response.
Flip is an AI-powered employee experience platform for frontline workers in retail, manufacturing, and logistics. The company is a young, rapidly growing tech company with a remote-first culture and offices in Berlin and Stuttgart.
Lead the design, development and operation of large-scale, secure observability systems to keep services online and performant.
Deploy and scale Prometheus, ElasticSearch clusters, and high-throughput Kafka data pipelines for millions of customer devices.
Collaborate with the Observability team to build alerting systems, APIs, and self-service monitoring tools using Terraform and multiple languages.
ItD is a new generation consulting and software development company that blends diversity, innovation, and integrity with real business results. It is a woman- and minority-led firm with a global community, empowering employees and offering benefits like medical, dental, vision, 401(k), and career development.
Act as a first responder for system incidents and outages, ensuring high availability and performance.
Own and evolve monitoring, alerting, and log management systems while optimizing database infrastructure.
Collaborate with engineering teams to build scalable, resilient systems and contribute to SRE tooling and automation.
Circle is building the world's leading all-in-one platform for online communities. We're a fully remote company of around 200 team members from 30+ countries, with a culture that values autonomy, async collaboration, and high expectations.
Act as a trusted technical partner, guiding organizations through onboarding, implementation, and expansion with white-glove support and best practices.
Deliver high-impact training, jumpstart engagements, and provide tailored technical consulting to help customers succeed.
Identify recurring issues, monitor support needs, and advocate for product improvements in close collaboration with internal teams.
Grafana Labs is the company behind Grafana, the open observability platform. With over 1,600 team members across 40+ countries, we are a 100% remote company backed by leading investors and trusted by more than 35 million users and 7,000+ customers.
Architect and scale the cloud platform behind a mission-critical SaaS product used globally.
Lead Infrastructure as Code maturity and drive automation, reliability, and cost optimisation.
Own uptime, SLAs, and incident management practices while mentoring engineers.
Innocraft (trading as Matomo) provides an open-source analytics platform trusted by enterprises and governments for full data ownership. The company values diversity and inclusion, and operates with a stable, mature product and strong engineering team.
Design and build the control plane for provisioning, scaling, and maintaining Postgres clusters.
Develop high availability, disaster recovery, and data protection mechanisms for production systems.
Build automation for database operations and contribute to distributed, fault-tolerant systems.
PlanetScale builds a next-generation managed database platform powering mission-critical applications at global scale. They are a remote-first engineering team with a collaborative culture focused on technical excellence and knowledge sharing.
Design and maintain scalable infrastructure-as-code solutions using Terraform and Kubernetes.
Build and operate observability systems while leading incident response and reliability improvements.
Embed security and compliance practices into infrastructure and optimize system performance and cloud costs.
This partner company builds a next-generation platform enabling AI-driven services across global employment infrastructure. It is a highly distributed, async-first organization where engineers thrive in ownership and autonomy.
Set delivery standards using Terraform, GitOps, and progressive rollout, while improving SLOs and alerting on Grafana Cloud.
Docker is a developer tooling company trusted by over 20 million monthly users and 20 billion container image pulls. They are a globally distributed, remote-first team building tools that define how software gets built and delivered.
Own and evolve observability strategy including monitoring, alerting, dashboards, logging, and distributed tracing.
Define and manage SLIs, SLOs, and reliability metrics, improving MTTD and MTTR through automation.
Build and maintain reliable cloud infrastructure on AWS and Kubernetes while mentoring engineers on SRE best practices.
Filevine is a Legal AI company delivering Legal Operating Intelligence for legal work. Fueled by a team of exceptional collaborators and innovators, Filevine’s rapid growth has earned AI awards and recognition from Deloitte and Inc. as one of the most innovative and fastest-growing technology companies in the country.
Take ownership of incident management and operational excellence across cloud infrastructure.
Automate high-risk manual processes and drive reliability gains through engineering.
Own a platform domain such as Temporal, observability, or Kubernetes operations.
Synthesia is the world’s leading AI video platform for business, used by over 90% of the Fortune 100. Founded in 2017, the company is headquartered in London with offices across Europe and the US, and has over $530 million in funding from premier investors like Accel and Nvidia's VC arm.
Own the operational excellence and infrastructure strategy for Remote Build's platform, ensuring reliability, performance, and security.
Lead incident response, build observability systems, and drive continuous improvement in system reliability.
Embed security into infrastructure, optimize costs, and automate operational toil to scale efficiently.
Remote solves modern organizations' biggest challenge of navigating global employment compliantly. With a fully distributed team across 6 continents, the company fosters a future-focused culture with core values of innovation and async work.