Take an active role in influencing our roadmap and your own career objectives
Work with your team to deliver new features, then use the results to iterate and improve
Drive projects from initial ideation all the way to operations once it is in the hands of customers
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana around the globe. The company helps more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack. Our team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything we do.
Coding new features, enhancing operational experience, and iteratively improving systems.
Authoring, contributing to, and reviewing design documents and shaping roadmaps.
Mentoring team members and participating in on-call rotations to own customer experience.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users globally. They help more than 3,000 companies -- including Bloomberg, JPMorgan Chase, and eBay -- manage their observability strategies and thrive in an innovation-driven environment where transparency, autonomy, and trust fuel everything they do.
Designs, implements, and continuously improves observability strategies across services.
Focuses on understanding system behavior in production, identifying failure modes, performance bottlenecks, and reliability risks.
Evolves and maintains shared AWS CDK and CDK8s constructs, with emphasis on observability, autoscaling, and operational safeguards.
Truelogic is a leading provider of nearshore staff augmentation services. They have a team of 600+ highly skilled tech professionals based in Latin America, partnering with U.S. companies on impactful projects and valuing expertise and aspirations.
Serve as the primary technical point of contact for a portfolio of Grafana customers.
Design the observability maturity journey of customers and assist them on that path.
Conduct regular technical reviews and health checks to ensure client success.
Grafana Labs is a remote-first, open-source powerhouse. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack. Their team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything they do.
Be a keen learner, working with cloud-native, highly scalable infrastructure and gaining expertise in container orchestration, networking, and observability.
Be a passionate problem solver, tackling scalability, reliability, and troubleshooting challenges in distributed systems.
Be a great communicator, engaging directly with developers, engineering teams, and product teams to understand infrastructure challenges and provide solutions.
Temporal provides an open-source programming model that simplifies code, improves application reliability, and helps developers focus on delivering features faster. They aim to be the reliable foundation of every developer’s toolbox and value curiosity, drive, collaboration, genuineness, and humility.
Lead the entire Software Development Lifecycle from start to finish.
Design and build multi-component, distributed systems that operate at scale.
Investigate issues with a methodical approach to identify their root cause.
Temporal is an open source programming model company. They have a mission to be the reliable foundation of every developer’s toolbox and are building the team that will make that happen; their values guide them. The company values curiosity, drive, collaboration, genuineness and humbleness and is looking for those who share their values.
Own and operate core platform systems across AWS, GCP, Vercel, Github, and Cloudflare.
Improve reliability, scalability, and security of production and non-production environments.
Improve local development environments and onboarding experience for engineers.
Moxie empowers ambitious aesthetic entrepreneurs to build profitable, independent practices. A global, remote-first team of more than 140 people supports hundreds of practices nationwide as they unlock sustainable success for aesthetic entrepreneurs.
Understand and participate in the changing FedRAMP space.
Own and champion high operational standards of Confluent Cloud systems leveraged by federal agencies.
Innovate and design solutions to reduce toil, bolster operational maturity, and make day-to-day worklife easier.
Confluent is rewriting how data moves and what the world can do with it. Their platform puts information in motion, streaming in near real-time so companies can react faster and build smarter. They value team players who ask hard questions, give honest feedback, and show up for each other.
Configure and maintain cloud infrastructure automation using Terraform, focusing on CDN optimization and content delivery performance
Develop capacity planning strategies and performance optimization initiatives for high-volume spatial content delivery.
Instrument services to understand system health.
Miris is a cutting-edge technology company building the future of 3D content delivery at global scale. Our mission is to empower creators and developers to deliver high-fidelity, photorealistic 3D experiences to billions of users instantly, seamlessly, and across all major platforms and devices.
Make deployments boring (in the best way possible)
Own CI/CD pipelines: optimize build times, improve caching, reduce flakiness
Evolve our Kubernetes (EKS) deployment strategy for reliability and speed
Obvious is building an AI-native workspace, an operating system for work that puts co-intelligence at the center. They are a small and talent-dense team with world-class builders, former founders, and leaders from companies like Netflix, Google, and Meta.
Design and implement highly scalable infrastructure for GitLab.com to support current and future growth.
Collaborate with cross-functional teams across the Infrastructure organization to plan and deliver projects that shape GitLab’s platform direction.
Operate and improve edge services and Kubernetes workloads, acting as a subject matter expert within the infrastructure department.
GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. They aim to enable everyone to contribute to and co-create the software that powers our world.
Architect, maintain, and scale critical infrastructure.
Ensure system reliability and optimize performance.
Implement modern deployment strategies.
Scribe's Workflow AI platform automatically captures and optimizes workflows so teams work smarter, faster, and more consistently. They are a fast-growing company founded in 2019 with over 5 million users across 600,000 businesses, and they are backed by leading investors.
Support and evolve the reliability of platforms used by the AI Research team.
Ensure production services meet expectations for availability, latency, and operational readiness.
Build and maintain Kubernetes-based services on GCP using infrastructure-as-code and GitOps.
Algolia is a pioneer and market leader in AI Search, empowering 17,000+ businesses to deliver blazing-fast, predictive search and browse experiences. They have raised $150 million in Series D funding, quadrupling their valuation to $2.25 billion, investing in their market-leading platform.
Provide cross-organizational leadership, effectively translating strategic objectives into actionable programs.
Lead end-to-end technical programs and large-scale initiatives from inception to delivery.
Identify and mitigate risks and interdependencies across complex, multi-team initiatives.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack. Their team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything they do.
You’ll own challenging infrastructure problems end-to-end.
You’ll design scalable, maintainable services and contribute to technical proposals.
You’ll contribute to the roadmap for our Provisioning team.
Canva is a design platform that enables users to create a variety of visual content. They have campuses in Sydney and Melbourne, co-working spaces in other major cities, and offer a flexible work environment.
Ensure the smooth operation and high availability of Clarifai's core services
Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency
Design and implement scalable, secure, and cost-effective infrastructure solutions
Clarifai is a leading AI platform specializing in computer vision and generative AI, empowering organizations to transform unstructured data into actionable insights. Founded in 2013, they have a diverse, globally distributed team with $100M in funding and are committed to building a diverse and inclusive team.
Design and implement core backend service features
Provide appropriate test coverage for unit, integration, and performance for your feature ownership area
Clearly document design choices and operational knowledge to successfully deploy and run service with those features
Temporal provides an open-source programming model simplifying code and enhancing application reliability, allowing developers to focus on feature delivery. They are a growing company aiming to be the reliable foundation of every developer's toolbox with a curious, driven, collaborative, genuine, and humble culture.
Design, implement, and test enhancements to Control APIs and services using Go on Kubernetes / GCP.
Build scalable systems for telemetry shaping and cost control, such as ingest policies and usage-based controls.
Partner closely with Product and Design to deliver customer-facing experiences, making it easy to understand telemetry cost drivers.
Chronosphere is an observability platform built for control in the modern, containerized world. They empower customers to focus on data and insights by reducing data complexity, optimizing costs, and remediating issues faster. Chronosphere is trusted by innovative brands like Snap, Robinhood, DoorDash, and Zillow.
Architect, operate, improve and secure the platform the Garner Health app runs on
Boost development velocity and productivity
Build systems to a high engineering standard and hold others to the same high standard
Garner has developed a revolutionary approach to evaluating doctor performance and a unique incentive model that's reshaping the healthcare economy to ensure everyone can afford high quality care. They have more than doubled their revenue annually over the last 5 years. Garner's award winning culture is designed to cultivate teamwork, trust, autonomy, exceptional results, and individual growth.
Own and maintain the incident response process, including defining procedures, tools, and best practices
Guide teams in establishing and monitoring Service Level Objectives (SLOs), including setting up alerts and reporting systems
Lead capacity planning initiatives, focusing on both short and long-term scalability while optimizing costs
Underdog makes sports more fun by building the best products for sports fans. They are a fast-growing sports company valued at $1.3B with a focus on a seamless, simple, easy to use, intuitive and fun app.