Design and operate our Kubernetes ecosystem with a focus on high availability and zero-downtime operations.
Own and evolve our PaaS strategy, using GitOps and CI/CD to empower domain teams to deploy independently.
Define and implement our observability strategy across metrics, logs, and tracing.
Finom is a European tech startup headquartered in Amsterdam, revolutionizing financial services for entrepreneurs. They offer an all-in-one financial B2B solution integrating banking, accounting, financial management, and invoicing into a mobile-first platform, with about 346 million in funding.
Design and maintain scalable infrastructure-as-code solutions using Terraform and Kubernetes.
Build and operate observability systems while leading incident response and reliability improvements.
Embed security and compliance practices into infrastructure and optimize system performance and cloud costs.
This partner company builds a next-generation platform enabling AI-driven services across global employment infrastructure. It is a highly distributed, async-first organization where engineers thrive in ownership and autonomy.
Own the operational excellence and infrastructure strategy for Remote Build's platform, ensuring reliability, performance, and security.
Lead incident response, build observability systems, and drive continuous improvement in system reliability.
Embed security into infrastructure, optimize costs, and automate operational toil to scale efficiently.
Remote solves modern organizations' biggest challenge of navigating global employment compliantly. With a fully distributed team across 6 continents, the company fosters a future-focused culture with core values of innovation and async work.
Build and maintain end-to-end observability with ELK, Prometheus, and Grafana.
Own and improve CI/CD pipelines (CircleCI, GitLab CI, GitHub Actions, ArgoCD).
Lead incident response and postmortems in a blameless culture.
Redcare Pharmacy is Europe’s No.1 e-pharmacy, powered by passionate teams and cutting-edge innovation. They strive to create a healthy, collaborative work environment where every employee feels valued and inspired to contribute to their vision “Until every human has their health”.
Lead the design, development and operation of large-scale, secure observability systems to keep services online and performant.
Deploy and scale Prometheus, ElasticSearch clusters, and high-throughput Kafka data pipelines for millions of customer devices.
Collaborate with the Observability team to build alerting systems, APIs, and self-service monitoring tools using Terraform and multiple languages.
ItD is a new generation consulting and software development company that blends diversity, innovation, and integrity with real business results. It is a woman- and minority-led firm with a global community, empowering employees and offering benefits like medical, dental, vision, 401(k), and career development.
Own and evolve observability strategy including monitoring, alerting, dashboards, logging, and distributed tracing.
Define and manage SLIs, SLOs, and reliability metrics, improving MTTD and MTTR through automation.
Build and maintain reliable cloud infrastructure on AWS and Kubernetes while mentoring engineers on SRE best practices.
Filevine is a Legal AI company delivering Legal Operating Intelligence for legal work. Fueled by a team of exceptional collaborators and innovators, Filevine’s rapid growth has earned AI awards and recognition from Deloitte and Inc. as one of the most innovative and fastest-growing technology companies in the country.
Design and build cloud-native infrastructure for reliability, observability, and automation across GCP, GKE, and Cloud Run.
Own incident response, root cause analysis, escalation workflows, and runbooks to prevent hard problems from recurring.
Develop Infrastructure as Code, CI/CD pipelines, and operational tooling to improve developer velocity and platform efficiency.
CertifyOS is building the data infrastructure that powers modern healthcare, automating provider licensing, enrollment, credentialing, and network monitoring through an API-first platform. The company is backed by leading investors with a team of deep experience in provider data systems, valuing authenticity, accountability, collaboration, results, and openness to feedback.
Design, build, and operate scalable cloud infrastructure using Kubernetes, Terraform, and modern infrastructure-as-code practices.
Improve and evolve cloud networking architecture, including VPC/VNet design, peering, routing, DNS, TLS, ingress/egress, and load balancing.
Contribute to system reliability through on-call support, incident response, root cause analysis, and performance optimization.
Jobgether is an AI-powered job matching platform that connects candidates with hiring companies. They use automated review and matching to ensure fair candidate evaluation.
Architect and scale the cloud platform behind a mission-critical SaaS product used globally.
Lead Infrastructure as Code maturity and drive automation, reliability, and cost optimisation.
Own uptime, SLAs, and incident management practices while mentoring engineers.
Innocraft (trading as Matomo) provides an open-source analytics platform trusted by enterprises and governments for full data ownership. The company values diversity and inclusion, and operates with a stable, mature product and strong engineering team.
Design and develop CI/CD systems for websites, services, and release workflows, and operate an EKS-based Kubernetes platform.
Diagnose debug production incidents, drive root-cause analysis, and implement improvements to enhance system reliability.
Write and maintain infrastructure as code using Pulumi or Terraform/OpenTofu across multiple AWS accounts with security-conscious practices.
Thunderbird is one of the world’s most trusted open-source email applications, empowering more than 20 million people globally. Our small but growing distributed team includes 65+ people across seven countries, and we build privacy-respecting communication tools with a collaborative, inclusive, and user-first spirit.
Own and evolve Webshare's production infrastructure by leading migration from Docker Swarm to Kubernetes and maintaining high availability across hundreds of servers and ~50 services.
Drive observability, establish IaC practices, CI/CD pipeline reliability, and participate in on-call rotation alongside backend developers.
Contribute platform tooling to improve developer experience and reduce infrastructure toil, ensuring no silos and shared infrastructure ownership.
We develop cutting-edge proxy and web data scraping solutions for thousands of the world's best known businesses, including Fortune 500 companies. We are a team of 500+ professionals with a culture focused on growth, learning, and shared infrastructure ownership.
Own and evolve production-grade cloud infrastructure on Azure.
Design and maintain robust Infrastructure-as-Code (IaC) architectures utilizing Terraform.
Build and optimize end-to-end CI/CD pipelines using GitHub Actions.
CodeRoad provides end-to-end software development services, helping businesses scale with ideal infrastructure solutions. From staff augmentation to dedicated IT teams and general software engineering, their nearshore technology services empower businesses to thrive in an ever-evolving digital landscape.
Design, provision, and manage AWS infrastructure using Terraform and Kubernetes.
Build, operate, and improve observability, monitoring, and incident response processes.
Collaborate with engineering teams on capacity planning, performance optimization, and resilient system design.
Vynca provides comprehensive care for individuals with complex needs, focusing on quality days at home. The company is a close-knit community guided by core values of Excellence, Compassion, Curiosity, and Integrity.
Design, build, and operate distributed systems powering observability across ClickHouse Cloud.
Own reliability, performance, and cost-efficiency of the telemetry pipeline and storage systems.
Take part in on-call rotation and drive root-cause resolution and long-term fixes.
ClickHouse is a real-time analytics and data warehousing company recognized on the 2025 Forbes Cloud 100 list. With over 3,000 customers and rapid growth, the company fosters an innovative and fast-paced culture.
Owning cloud infrastructure on Azure, data pipeline orchestration, CI/CD, and observability to ensure production-grade reliability.
Building and maintaining foundational infrastructure that enables fast engineering velocity without breaking things.
Applying SRE principles such as SLOs, capacity planning, incident response, and eliminating toil through automation.
Terzo's platform processes enterprise-scale document corpora, powers real-time AI agents, and serves the Financial Intelligence Graph to Fortune 500 customers. As a small, senior team with strong ownership and minimal bureaucracy, we foster a culture of collaboration, mentorship, and continuous improvement.
Own and evolve the cloud platform including compute layer, EKS fleet, serverless infrastructure, networking, and cloud operations across AWS and GCP.
Design and maintain infrastructure-as-code foundation and networking layer for reliability, security, and scalability.
Build AI-powered automation for cloud infrastructure management, including policy-as-code, drift detection, and LLM-assisted runbook generation.
Webflow builds the world's leading AI-native Digital Experience Platform, empowering teams to design, launch, and optimize for the web without barriers. As a remote-first company with over 2 million users across 190 countries, it fosters a culture of trust, transparency, and creativity.
Act as a first responder for system incidents and outages, ensuring high availability and performance.
Own and evolve monitoring, alerting, and log management systems while optimizing database infrastructure.
Collaborate with engineering teams to build scalable, resilient systems and contribute to SRE tooling and automation.
Circle is building the world's leading all-in-one platform for online communities. We're a fully remote company of around 200 team members from 30+ countries, with a culture that values autonomy, async collaboration, and high expectations.
Lead reliability initiatives across multiple Ads domains including ad serving, auctions, targeting, reporting, measurement, and billing.
Partner with engineering leadership to improve reliability, scalability, operational excellence, and engineering efficiency across the Ads organization.
Design and build platforms, tooling, and automation that improve reliability and developer productivity at scale.
Reddit is a community of communities, built on shared interests, passion, and trust, home to the most open and authentic conversations on the internet. With 100,000+ active communities and approximately 126 million daily active unique visitors, it is one of the internet's largest sources of information.
Build and maintain infrastructure platforms for over 200 backend services running on Kubernetes clusters with 40,000+ cores.
Lead and mentor other engineers, own complex infrastructure failures, and participate in a shared on-call rotation.
Drive cloud cost efficiency, estimate schedules, and use AI tools as a first-class collaborator in daily workflows.
Life360's mission is to keep people close to the ones they love through location sharing, safe driver reports, and crash detection. The company serves approximately 97.8 million monthly active users across more than 180 countries and has more than 500 remote-first employees.
Design, build, and maintain cloud infrastructure across Azure, GCP, and AWS, including landing zones, Kubernetes, and CI/CD pipelines.
Implement monitoring, security, and hybrid connectivity for enterprise-scale cloud environments.
Collaborate cross-functionally, mentor engineers, and leverage AI tools to accelerate infrastructure development.
Applied is an Insurtech company that builds technology solutions for insurance professionals. With over 40 years of experience, they foster a culture of trust, inclusion, and growth.