Build and operate the internal engineering platform that provides application engineers with the tools, systems, and Kubernetes clusters they need to deploy and run their workloads.
Focus on cloud infrastructure, capacity management, security, engineering productivity, monitoring, and US Federal compliance across squads.
Participate in on-call rotations to ensure the health of the system and understand how people use our products.
Grafana Labs, the company behind the open observability cloud, is founded on the principles of open source, open standards, open ecosystems, and open culture. We are a 100% remote company with 1,600+ team members across 40+ countries, backed by leading investors including Lightspeed Venture Partners, Sequoia Capital, GIC, Coatue, J.P. Morgan, CapitalG, and Lead Edge Capital.
Design and build large-scale distributed systems and high-throughput data pipelines using Go and cloud-native technologies.
Lead system-wide architectural decisions focusing on data flow, performance, and resilience.
Champion best engineering practices, code quality, testing, and maintainability while mentoring junior engineers.
DoiT is a global technology company that helps organizations leverage the cloud for business growth, combining data, technology, and human expertise. With thousands of customers worldwide, DoiT fosters a remote-first culture that values entrepreneurship, knowledge pursuit, and fun.
Design and deliver robust, high-scale routing experiences for Data Pipelines for Twilio Segment.
Operate always-available, complex distributed systems in cloud environments.
Collaborate cross-functionally with design, product, and other engineers to define solutions.
Twilio is shaping the future of communications, delivering innovative solutions to hundreds of thousands of businesses and empowering millions of developers worldwide. The company is remote-first with a strong culture of connection and global inclusion, and employs a diverse team of Twilions.
Architect and operate high-scale ingestion and data processing systems at Twilio Segment.
Lead the development of complex distributed systems ensuring reliability, performance, and cost-efficiency.
Translate technical strategies into actionable plans for diverse stakeholders including Product Managers and Architects.
Twilio is a communications platform that delivers innovative solutions to hundreds of thousands of businesses and empowers millions of developers worldwide. The company is remote-first with a strong culture of connection and global inclusion, employing a diverse team.
Write and maintain test suites that give the team confidence to ship.
Package and deploy open source software components in Kubernetes environments.
Contribute to internal tooling, dashboards, and documentation that make complex systems understandable.
Defense Unicorns delivers mission value by streamlining software delivery for defense and government customers. Their team of innovators, software engineers, and veterans focuses on security, speed, and user experience in a remote-first culture.
Own moderately complex backend features and services in Go on GCP end-to-end from design through production.
Write clean, tested, production-ready code and improve it continuously.
Contribute to infrastructure, observability tooling, and on-call preparation.
Chainguard is the trusted source for open source software, delivering hardened and secure builds to eliminate risk. The company is venture-backed by leading investors and serves Fortune 500 enterprises.
Lead a platform team building high-throughput messaging, eventing, and notification infrastructure.
Manage and grow engineers, driving roadmap planning and cross-team communication.
Own delivery reliability for email, chat integrations, and event pub/sub systems at scale.
KnowBe4 empowers the modern workforce to make smarter security decisions every day. Trusted by more than 70,000 organizations worldwide, the company is the pioneer of digital workforce security.
Design, build, and operate distributed systems powering observability across ClickHouse Cloud.
Own reliability, performance, and cost-efficiency of the telemetry pipeline and storage systems.
Take part in on-call rotation and drive root-cause resolution and long-term fixes.
ClickHouse is a real-time analytics and data warehousing company recognized on the 2025 Forbes Cloud 100 list. With over 3,000 customers and rapid growth, the company fosters an innovative and fast-paced culture.
Provide frontline technical expertise to help developers deploy and scale Temporal in cloud-native environments.
Troubleshoot complex infrastructure issues, optimize performance, and develop automation solutions.
Collaborate with engineering and product teams to influence platform improvements and enhance developer experience.
Temporal provides an open source programming model that simplifies code and makes applications more reliable. The company is a growing team driven by values of curiosity, collaboration, and humility, focused on improving developer experience.
Help guide technical direction and contribute to platform architectural strategy.
Champion engineering principles and hold the bar on code quality.
Elevate engineers around you through pairing and knowledge sharing.
Arctic Wolf is a cybersecurity company that helps organizations end cyber risk. They have a global presence with over 10,000 customers and more than 2,000 channel partners, and it is known for its award-winning Aurora Platform.
Design, build, and operate high-scale data ingestion and replication systems from production data stores into the data lakehouse.
Build and maintain reliable, scalable data platform infrastructure capable of handling petabytes of data across analytics, AI, and operational use cases.
Develop internal libraries, APIs, frameworks, and tooling in languages such as Go and Python to help teams move and access data safely.
Samsara is the pioneer of the Connected Operations Cloud, enabling organizations that depend on physical operations to harness IoT data for actionable insights. As a publicly traded company, Samsara fosters a growth-oriented culture and serves industries that represent over 40% of global GDP.
Design, develop, and maintain filesystem and container runtime components of Docker's local runtime stack.
Investigate and resolve correctness, performance, and stability issues across macOS, Windows, and Linux.
Work on VirtioFS, OverlayFS, and related filesystem technologies for AI agent workloads.
Docker provides developer tooling trusted by over 20 million monthly users and 20 billion container image pulls, enabling build, share, and run for applications. They are a globally distributed, remote-first team defining how software gets built and delivered with AI agent integration.
Set technical direction for the Athena clearing house, making architectural calls on data validation pipelines and workflow orchestration.
Scale the team and product area, driving transition from rapid prototyping to sustainable, production-grade product stack.
Lead design of systems processing unstructured vulnerability reports, deduplicating findings, and surfacing clean signals to remediation teams.
Chainguard secures the open source software supply chain, delivering hardened, production-ready builds of open source software. They are venture-backed by leading investors and serve Fortune 500 enterprises and global industry leaders.
Take part in all development aspects from design through production of a dynamic hybrid/cloud native product.
Write high quality, testable and efficient code in Go and Typescript.
Initiate and promote new ideas for continuous improvement of the product functionality.
Torq is a cybersecurity company with an AI SOC platform that grabs attention. Backed by Series D funding, they have experienced 200% employee growth and 300% revenue growth, making them one of Forbes' Best Startup Employers in America and a Business Insider 'startup to bet your career on'.
Design and build scalable components for high-throughput data ingestion and processing.
Develop systems for storing and serving batch data, and contribute to API services and event-driven applications.
Optimize data storage and retrieval for high throughput, security, and ease of access, and mentor peers.
Phaidra builds AI-powered control systems for industrial facilities, using reinforcement learning to optimize automation. The company is fully remote with a global team of around 100+ employees, emphasizing a culture of transparency, collaboration, ownership, and empathy.
Earning the trust of our large-scale operator customers to further Grafana's "big tent" philosophy of data accessibility and to meet clear business objectives.
Designing and leading the development of backend services, distributed systems, and enterprise features at scale.
Driving continuous improvement of our engineering culture through words and actions.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana, the open source visualization tool, around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, which can be run fully managed with Grafana Cloud or self-managed with the Grafana Enterprise Stack. The Grafana team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything they do.
United StatesCanadaUK
Unlimited PTO
18w maternity
12w paternity
Build and maintain core components of the clearing house in Go on GCP, including customer onboarding flows and data ingestion pipelines.
Take ownership of ambiguous problems and drive features from design through production with appropriate testing and observability.
Participate in on-call rotation, contribute to incident response, and become a go-to engineer for core subsystems.
Chainguard is the trusted source for secure open source software, delivering hardened builds for enterprise customers. The company is venture-backed by leading investors and serves Fortune 500 enterprises.
Build data-intensive systems designing high-throughput integrations with databases, data lakes, and warehouses.
Own end-to-end reliability by debugging complex issues and improving infrastructure.
Drive product innovation by working with customers and collaborating cross-functionally.
ClickHouse is a leading real-time analytics company recognized on the 2025 Forbes Cloud 100 list. With over 3,000 customers and rapid growth, they provide a platform for data warehousing, observability, and AI workloads.
Design and build backend systems, APIs, infrastructure, and platform capabilities that improve developer workflows across Reddit.
Build scalable and reliable systems across both AI-powered developer workflows and the core non-AI systems engineers rely on every day.
Lead high-impact projects across Reddit’s developer tooling ecosystem by writing and reviewing code and design docs, aligning stakeholders, and making pragmatic technical tradeoffs.
Reddit is a community-based platform built on shared interests, passion, and trust, facilitating open and authentic conversations. With over 100,000 active communities and approximately 126 million daily active unique visitors, it serves as one of the internet’s largest sources of information.
Own end-to-end domain within the clearing house: customer onboarding, entitlements, or data validation.
Drive architecture and implementation of backend systems in Go on GCP, ensuring production readiness.
Establish engineering best practices and collaborate with principal engineer on technical planning.
Chainguard secures the open source software supply chain by providing hardened, secure builds of open source software. It is a venture-backed startup with a remote-first culture, trusted by Fortune 500 enterprises.