Design, build, and operate reconciliation systems to track desired stack state, detect and repair drift across stack templates, grafana.com state, Hosted Grafana, and actual customer stack configuration.
Collaborate across SSS, grafana.com, and deployment configurations to ensure stack lifecycle workflows remain reliable, observable, and resilient.
Improve operational efficiency by reducing deployment complexity and contributing to the Stack Config Reconciliation project.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack. Their team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything they do.
Design, develop, and deliver high-quality backend services and APIs primarily using Go (Golang) and deploy them in Kubernetes environments.
Build and maintain automated tests for backend services, participate in code reviews, and monitor service performance in production to debug issues.
Provide technical guidance on backend architecture and integration challenges, sharing knowledge and supporting continuous improvement of processes and documentation.
Applied Systems provides innovative software and services for the insurance industry. They are an established insurtech company with 40+ years of experience and focus on creating a collaborative, value-driven culture for their team.
Design, build, and operate reconciliation systems to track desired stack state, detect and repair drift across stack templates, grafana.com state, Hosted Grafana, and actual customer stack configuration.
Collaborate across SSS, grafana.com, and deployment configurations to ensure stack lifecycle workflows remain reliable, observable, and resilient.
Improve operational efficiency by reducing deployment complexity and contributing to the Stack Config Reconciliation project.
Grafana Labs is a remote-first, open-source powerhouse with over 20M users of Grafana. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, featuring scalable metrics (Grafana Mimir), logs (Grafana Loki), and traces (Grafana Tempo).
Troubleshoot complex customer issues across networking, APIs, and communications products.
Investigate using tools like Postman, cURL, Wireshark, Linux terminal and our internal tooling.
Collaborate with customers, vendors, and engineering teams to resolve cases.
Telnyx is building the future of global connectivity, from architecting a global, multi-cloud IP network to bringing hyperlocal edge technology. They solve real-world problems through innovative connectivity solutions and foster an environment of continuous learning and growth for their team.
Supervise and coordinate NOC activities to ensure network infrastructure availability and efficiency.
Monitor network performance, proactively implementing solutions to prevent service disruptions.
Develop and maintain standard operating procedures and best practices for the NOC team.
Honest Networks delivers high-quality and affordable internet service as a catalyst for community growth, fostering learning, creativity, and enjoyment. They are a rapidly expanding, venture-backed internet service provider headquartered in Manhattan.
Lead and grow a team of backend/platform engineers building the systems that power Intercept and other critical applications.
Drive execution across planning, prioritization, and delivery with a focus on reliability and scalability.
Own the technical direction of backend services and APIs, ensuring systems are secure, performant, and maintainable.
SentiLink provides innovative identity and risk solutions, empowering institutions and individuals to transaction with confidence. They're building the future of identity verification in the United States, replacing a clunky, ineffective, and expensive status quo with solutions that are 10x faster, smarter, and more accurate.
Own Render's core network infrastructure across multiple data centers and cloud providers, shaping how networking evolves as Render rapidly scales.
Design and build customer-facing networking capabilities that give users greater flexibility in how their services connect and communicate, and how traffic is routed.
Investigate complex networking issues across the stack, from the kernel and data plane to distributed systems and edge networking.
Render is building a modern cloud platform for developers creating AI-native, full-stack, multi-service applications, eliminating the tradeoff between hyperscaler power and developer-friendliness. They are a diverse and talented team that values craft, velocity, and user experience.
Build and maintain Python fleet tracking system that manages the full lifecycle of servers.
Build server management tooling that automates provisioning, health checks, GPU diagnostics, recovery and alerting.
Create and maintain metrics, dashboards, and alerting for hardware health across the fleet.
FAL is committed to keeping a large fleet of GPU servers healthy and productive. They offer a collaborative and supportive culture with learning and growth opportunities.
Build scalable backend services and APIs that power our digital merchandising platform.
Work with other senior engineers to contribute to high level decisions about the architecture and design.
Work with Product Managers to make Jane’s advertising product offerings sound, robust and easy to use.
Jane Technologies is an MIT-founded eCommerce company in the cannabis industry experiencing rapid growth. Their mission is to bring confidence to the online cannabis shopping experience by connecting consumers with local dispensaries and brands. They are a small close-knit team of highly technical engineers with diverse backgrounds and a strong engineering culture.
Assess and improve visibility by identifying gaps in dashboards, metrics, and logs.
Refine alerts and dashboards for critical services to catch issues earlier.
Automate routine checks and monitoring tasks to free up engineers.
PlayOn is where high school sports come to life through platforms like GoFan, NFHS Network, and MaxPreps. As a growth-stage company backed by KKR, we build the technology that powers high school athletics from ticketing and streaming to fundraising and merchandise.
Design and deliver software solutions in Go to improve the availability, scalability, and latency of Reddit's compute infrastructure.
Develop Kubernetes controllers and operators to automate cluster management, workload scheduling, and the reconciliation of complex system states.
Build core tooling and SDKs that codify network configurations, managed services, and compute capacity tracking across a multi-region fleet.
Reddit is an online platform built on shared interests, passion, and trust, home to open and authentic conversations. It is a large-scale community with over 100,000 active communities and approximately 126 million daily active unique visitors.
Provide production support on a shift according to the team on-call roster.
Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support.
Continuously monitor the health and performance of our services, systems, and infrastructure.
Granicus builds and maintains technology that is transforming the Govtech industry by bringing governments and its constituents together. They serve 5,500 federal, state, and local government agencies and more than 300 million citizen subscribers, and are known for being one of the best companies to work for.
Rotate across engineering squads to tackle high-impact projects.
Ship production code in multiple languages and stacks.
Debug and optimize distributed systems processing thousands of messages per second.
Telnyx is building the future of global connectivity. They have a private, global, multi-cloud IP network and use intuitive APIs. They are a financially stable and profitable company that fosters an environment of continuous learning and growth for their team.
Help guide technical direction and contribute to platform architectural strategy.
Champion engineering principles and hold the bar on code quality.
Elevate engineers around you through pairing and knowledge sharing.
Arctic Wolf is a cybersecurity company that helps organizations end cyber risk. They have a global presence with over 10,000 customers and more than 2,000 channel partners, and it is known for its award-winning Aurora Platform.
You will lead a dedicated team, driving innovation while prioritizing safety and efficiency.
Manage and lead a team of engineers, providing technical direction and setting clear goals.
Support incident response efforts by building ad-hoc tools for threat hunting and remediation.
Jobgether uses an AI-powered matching process to ensure your application is reviewed quickly. Their system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company.
Building infrastructure as code and DevOps pipelines and reviewing solutions.
Researching and analyzing technical solutions, maintaining and enhancing documentation.
Proactively identifying blockers, risks, and issues, proposing solutions or escalating as appropriate.
Nava is a consultancy and public benefit corporation working to make government services simple and effective. They guide agencies constrained by legacy systems to a future with sharp user experiences built on secure, reliable, fault-tolerant cloud infrastructure.
Manage and grow a distributed team of engineers, providing feedback and supporting career development.
Partner with product management to shape the Usage squad's roadmap, ensuring alignment with company mission and customer impact.
Guide the team through the full project lifecycle, ensuring high-quality and timely outcomes within the Usage domain.
Grafana Labs is a remote-first, open-source powerhouse with over 20M users globally. Their team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything they do.
Engage with customers to provide technical assistance, troubleshooting, and best-practice guidance.
Diagnose, reproduce, and resolve issues related to agent connectivity, device enrollment, patch deployment, software installation.
Collaborate cross-functionally with Engineering, Customer Success, Professional Services, and Product teams to resolve customer issues.
Automox is a cloud-native IT operations platform for modern organizations, helping to keep every endpoint automatically configured, patched, and secured – anywhere in the world. They are trusted by more than 2,500 leading companies and MSPs worldwide.
Drive architecture and technical strategy for core platform systems, APIs, and data pipelines
Hire, manage, and develop a high-performing engineering team
Partner with Product and Data teams to define scope, timelines, and tradeoffs
VulnCheck is transforming exploit intelligence by helping security teams act faster. They deliver exploit intelligence, asset correlation, and contextual insights. Founded in 2021 in Lexington, Massachusetts, they have a transparent, collaborative, and supportive culture.
Build prototypes and POCs that showcase Tailscale for AI agents and tooling.
Work with reference customers to integrate Tailscale, both for internal adoption and for embedding into their products to enable secure customer connectivity.
Create reference architectures and share your work through documentation, open source, community engagement, and conference presentations.
Tailscale is building a new Internet by delivering software that makes it easy to securely interconnect people and their devices, no matter where they are. They are a fully distributed company, and teams of every size use Tailscale each day to protect their networks and share access to internal tools.