Design, build, and operate distributed systems powering observability across ClickHouse Cloud.
Own reliability, performance, and cost-efficiency of the telemetry pipeline and storage systems.
Take part in on-call rotation and drive root-cause resolution and long-term fixes.
ClickHouse is a real-time analytics and data warehousing company recognized on the 2025 Forbes Cloud 100 list. With over 3,000 customers and rapid growth, the company fosters an innovative and fast-paced culture.
Own the operational excellence and infrastructure strategy for Remote Build's platform, ensuring reliability, performance, and security.
Lead incident response, build observability systems, and drive continuous improvement in system reliability.
Embed security into infrastructure, optimize costs, and automate operational toil to scale efficiently.
Remote solves modern organizations' biggest challenge of navigating global employment compliantly. With a fully distributed team across 6 continents, the company fosters a future-focused culture with core values of innovation and async work.
Design and maintain scalable infrastructure-as-code solutions using Terraform and Kubernetes.
Build and operate observability systems while leading incident response and reliability improvements.
Embed security and compliance practices into infrastructure and optimize system performance and cloud costs.
This partner company builds a next-generation platform enabling AI-driven services across global employment infrastructure. It is a highly distributed, async-first organization where engineers thrive in ownership and autonomy.
Partner with Ads Engineering teams to improve reliability, scalability, and operational excellence of ad-serving and related systems.
Design, build, and maintain infrastructure, tooling, and automation to improve service reliability and engineering productivity.
Participate in on-call rotations, lead incident response, and drive root cause analysis and corrective actions.
Reddit is a community of communities built on shared interests, passion, and trust. With 100,000+ active communities and approximately 126 million daily active unique visitors, it is one of the internet's largest sources of information.
Build and operate the internal engineering platform that provides application engineers with the tools, systems, and Kubernetes clusters they need to deploy and run their workloads.
Focus on cloud infrastructure, capacity management, security, engineering productivity, monitoring, and US Federal compliance across squads.
Participate in on-call rotations to ensure the health of the system and understand how people use our products.
Grafana Labs, the company behind the open observability cloud, is founded on the principles of open source, open standards, open ecosystems, and open culture. We are a 100% remote company with 1,600+ team members across 40+ countries, backed by leading investors including Lightspeed Venture Partners, Sequoia Capital, GIC, Coatue, J.P. Morgan, CapitalG, and Lead Edge Capital.
Lead reliability initiatives across multiple Ads domains including ad serving, auctions, targeting, reporting, measurement, and billing.
Partner with engineering leadership to improve reliability, scalability, operational excellence, and engineering efficiency across the Ads organization.
Design and build platforms, tooling, and automation that improve reliability and developer productivity at scale.
Reddit is a community of communities, built on shared interests, passion, and trust, home to the most open and authentic conversations on the internet. With 100,000+ active communities and approximately 126 million daily active unique visitors, it is one of the internet's largest sources of information.
Design and build tools and frameworks to automate operational tasks and deployments for Portal and Endpoint Agents.
Evolve AI tooling and workflows to enhance developer productivity and integrate AI into daily development.
Build and maintain CI/CD pipelines, support product teams, and optimize software architecture for scalability and reliability.
Huntress is a cybersecurity company founded in 2015 by former NSA cyber operators, focused on protecting small to midsize businesses from cyber attacks through its award-winning security platform and expert human threat hunters. The company is fully remote and fosters a culture of inclusivity, innovation, and collaboration.
Build and maintain end-to-end observability with ELK, Prometheus, and Grafana.
Own and improve CI/CD pipelines (CircleCI, GitLab CI, GitHub Actions, ArgoCD).
Lead incident response and postmortems in a blameless culture.
Redcare Pharmacy is Europe’s No.1 e-pharmacy, powered by passionate teams and cutting-edge innovation. They strive to create a healthy, collaborative work environment where every employee feels valued and inspired to contribute to their vision “Until every human has their health”.
Provide frontline technical expertise to help developers deploy and scale Temporal in cloud-native environments.
Troubleshoot complex infrastructure issues, optimize performance, and develop automation solutions.
Collaborate with engineering and product teams to influence platform improvements and enhance developer experience.
Temporal provides an open source programming model that simplifies code and makes applications more reliable. The company is a growing team driven by values of curiosity, collaboration, and humility, focused on improving developer experience.
Design and operate enterprise-grade observability platforms across metrics, logs, traces, and events.
Build scalable monitoring stacks with Prometheus, Grafana, Loki, Tempo, OpenTelemetry, and Datadog.
Define SLOs, SLIs, error budgets, and alerting strategies to ensure system reliability.
Our partner is a technology company focused on building scalable observability platforms for distributed systems. They are an engineering-driven organization with a strong emphasis on automation, scalability, and developer experience.
Provide technical leadership to shape architecture and drive execution on high-leverage SOC workflows.
Partner with engineering, product, UX, and SOC stakeholders to build reliable, scalable investigation systems.
Mentor senior engineers and make pragmatic technical decisions that improve analyst effectiveness and customer outcomes.
Huntress is a cybersecurity company founded in 2015 by former NSA cyber operators, dedicated to making enterprise-grade security accessible for businesses of all sizes. They are a remote-first team that secures over 5 million endpoints and 11 million identities worldwide, fostering a culture of inclusivity and collaboration.
Take an active role in influencing our roadmap and your own career objectives.
Drive projects from initial ideation all the way to operations once it is in the hands of customers.
Design, build, operate, and maintain critical systems, owning the reliability, performance, and availability.
Grafana Labs is behind the open observability cloud, and is founded on the principles of open source, open standards, open ecosystems, and open culture. They are a 100% remote company with 1,600+ team members across 40+ countries.
Independently troubleshoot enterprise CI/CD and infrastructure issues for top tech companies.
Design and implement proactive tools, processes, and open source contributions.
Provide support via Slack, Zoom, and Community Forum with no on-call duties.
Buildkite is rethinking software delivery, building a fast, reliable, and secure CI/CD platform for high-growth tech companies like Airbnb and Canva. They are a remote-first company with a culture of kindness, autonomy, and collaboration.
Earning the trust of our large-scale operator customers to further Grafana's "big tent" philosophy of data accessibility and to meet clear business objectives.
Designing and leading the development of backend services, distributed systems, and enterprise features at scale.
Driving continuous improvement of our engineering culture through words and actions.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana, the open source visualization tool, around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, which can be run fully managed with Grafana Cloud or self-managed with the Grafana Enterprise Stack. The Grafana team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything they do.
Design, provision, and manage AWS infrastructure using Terraform and Kubernetes.
Build, operate, and improve observability, monitoring, and incident response processes.
Collaborate with engineering teams on capacity planning, performance optimization, and resilient system design.
Vynca provides comprehensive care for individuals with complex needs, focusing on quality days at home. The company is a close-knit community guided by core values of Excellence, Compassion, Curiosity, and Integrity.
Design and operate our Kubernetes ecosystem with a focus on high availability and zero-downtime operations.
Own and evolve our PaaS strategy, using GitOps and CI/CD to empower domain teams to deploy independently.
Define and implement our observability strategy across metrics, logs, and tracing.
Finom is a European tech startup headquartered in Amsterdam, revolutionizing financial services for entrepreneurs. They offer an all-in-one financial B2B solution integrating banking, accounting, financial management, and invoicing into a mobile-first platform, with about 346 million in funding.
Own and evolve AWS infrastructure using Terraform, managing EKS clusters, databases, and core services.
Maintain CI/CD reliability and developer tooling across the full engineering org.
Lead incident response, drive post-incident reviews, and improve monitoring and alerting standards.
Babylist is the leading platform for expecting and new families, helping parents feel confident, connected, and cared for at every step. As a modern, AI-forward tech company with over 10 million yearly shoppers, Babylist has expanded into a full ecosystem and generated $750M in revenue in 2025, reshaping the $235B kids and baby market.
Plan and iterate over the product development lifecycle alongside other engineers.
Break down complex technical challenges into manageable work components with clear deliverables.
Collaborate with cross-departmental teams to ensure technical solutions meet business interests and maintain data integrity.
Feathr is a nonprofit marketing platform trusted by over 1,300 nonprofits, providing software to help them build community connections and grow impact. The company is building an amazing culture with a team of amazing people, focused on helping the helpers.
Design and build core platform infrastructure for large-scale cloud-native data and analytics systems.
Own and improve CI/CD pipelines, testing frameworks, and deployment in a high-scale PaaS environment.
Contribute to reliability engineering, observability, and operational excellence across distributed systems.
Jobgether uses an AI-powered matching process to connect candidates with roles. The company is a growing platform focused on efficient job matching and data privacy compliance.