Own and evolve AWS infrastructure using Terraform, managing EKS clusters, databases, and core services.
Maintain CI/CD reliability and developer tooling across the full engineering org.
Lead incident response, drive post-incident reviews, and improve monitoring and alerting standards.
Babylist is the leading platform for expecting and new families, helping parents feel confident, connected, and cared for at every step. As a modern, AI-forward tech company with over 10 million yearly shoppers, Babylist has expanded into a full ecosystem and generated $750M in revenue in 2025, reshaping the $235B kids and baby market.
Design and develop CI/CD systems for websites, services, and release workflows, and operate an EKS-based Kubernetes platform.
Diagnose debug production incidents, drive root-cause analysis, and implement improvements to enhance system reliability.
Write and maintain infrastructure as code using Pulumi or Terraform/OpenTofu across multiple AWS accounts with security-conscious practices.
Thunderbird is one of the world’s most trusted open-source email applications, empowering more than 20 million people globally. Our small but growing distributed team includes 65+ people across seven countries, and we build privacy-respecting communication tools with a collaborative, inclusive, and user-first spirit.
Take ownership of incident management and operational excellence across cloud infrastructure.
Automate high-risk manual processes and drive reliability gains through engineering.
Own a platform domain such as Temporal, observability, or Kubernetes operations.
Synthesia is the world’s leading AI video platform for business, used by over 90% of the Fortune 100. Founded in 2017, the company is headquartered in London with offices across Europe and the US, and has over $530 million in funding from premier investors like Accel and Nvidia's VC arm.
Design, provision, and manage AWS infrastructure using Terraform and Kubernetes.
Build, operate, and improve observability, monitoring, and incident response processes.
Collaborate with engineering teams on capacity planning, performance optimization, and resilient system design.
Vynca provides comprehensive care for individuals with complex needs, focusing on quality days at home. The company is a close-knit community guided by core values of Excellence, Compassion, Curiosity, and Integrity.
Own and evolve observability strategy including monitoring, alerting, dashboards, logging, and distributed tracing.
Define and manage SLIs, SLOs, and reliability metrics, improving MTTD and MTTR through automation.
Build and maintain reliable cloud infrastructure on AWS and Kubernetes while mentoring engineers on SRE best practices.
Filevine is a Legal AI company delivering Legal Operating Intelligence for legal work. Fueled by a team of exceptional collaborators and innovators, Filevine’s rapid growth has earned AI awards and recognition from Deloitte and Inc. as one of the most innovative and fastest-growing technology companies in the country.
Design, build, and operate distributed systems powering observability across ClickHouse Cloud.
Own reliability, performance, and cost-efficiency of the telemetry pipeline and storage systems.
Take part in on-call rotation and drive root-cause resolution and long-term fixes.
ClickHouse is a real-time analytics and data warehousing company recognized on the 2025 Forbes Cloud 100 list. With over 3,000 customers and rapid growth, the company fosters an innovative and fast-paced culture.
Build and maintain infrastructure platforms for over 200 backend services running on Kubernetes clusters with 40,000+ cores.
Lead and mentor other engineers, own complex infrastructure failures, and participate in a shared on-call rotation.
Drive cloud cost efficiency, estimate schedules, and use AI tools as a first-class collaborator in daily workflows.
Life360's mission is to keep people close to the ones they love through location sharing, safe driver reports, and crash detection. The company serves approximately 97.8 million monthly active users across more than 180 countries and has more than 500 remote-first employees.
Design and build cloud-native infrastructure for reliability, observability, and automation across GCP, GKE, and Cloud Run.
Own incident response, root cause analysis, escalation workflows, and runbooks to prevent hard problems from recurring.
Develop Infrastructure as Code, CI/CD pipelines, and operational tooling to improve developer velocity and platform efficiency.
CertifyOS is building the data infrastructure that powers modern healthcare, automating provider licensing, enrollment, credentialing, and network monitoring through an API-first platform. The company is backed by leading investors with a team of deep experience in provider data systems, valuing authenticity, accountability, collaboration, results, and openness to feedback.
Design, implement, and improve Site Reliability Engineering practices across production environments with a focus on SLOs, SLIs, and error budgets.
Lead incident response processes and build observability strategies including monitoring, logging, alerting, and distributed tracing.
Partner with engineering teams to enhance system reliability, availability, scalability, and operational efficiency.
Oowlish is a rapidly expanding software development company in Latin America that collaborates with premier clients from the United States and Europe to create pioneering digital solutions. Certified as a Great Place to Work, it offers a nurturing environment with opportunities for professional growth and international impact.
Own the operational excellence and infrastructure strategy for Remote Build's platform, ensuring reliability, performance, and security.
Lead incident response, build observability systems, and drive continuous improvement in system reliability.
Embed security into infrastructure, optimize costs, and automate operational toil to scale efficiently.
Remote solves modern organizations' biggest challenge of navigating global employment compliantly. With a fully distributed team across 6 continents, the company fosters a future-focused culture with core values of innovation and async work.
Act as a first responder for system incidents and outages, ensuring high availability and performance.
Own and evolve monitoring, alerting, and log management systems while optimizing database infrastructure.
Collaborate with engineering teams to build scalable, resilient systems and contribute to SRE tooling and automation.
Circle is building the world's leading all-in-one platform for online communities. We're a fully remote company of around 200 team members from 30+ countries, with a culture that values autonomy, async collaboration, and high expectations.
Design and operate our Kubernetes ecosystem with a focus on high availability and zero-downtime operations.
Own and evolve our PaaS strategy, using GitOps and CI/CD to empower domain teams to deploy independently.
Define and implement our observability strategy across metrics, logs, and tracing.
Finom is a European tech startup headquartered in Amsterdam, revolutionizing financial services for entrepreneurs. They offer an all-in-one financial B2B solution integrating banking, accounting, financial management, and invoicing into a mobile-first platform, with about 346 million in funding.
Own and operate 100+ multi-cloud streaming clusters and related database infrastructure in production.
Diagnose and eliminate cross-layer failure modes such as object storage latency, noisy neighbors, and query performance regressions.
Design safe upgrade and rollout strategies at scale, improving observability, automation, and operational ergonomics.
Grafana Labs is the company behind the open observability cloud, providing a fully managed observability platform built for scale. With over 35 million users and 7,000+ customers, we are a 100% remote company of 1,600+ team members across 40+ countries, backed by leading investors.
Design, deploy, and manage production Kubernetes clusters with workload scheduling, resource quotas, network policies, and RBAC.
Build and optimize CI/CD pipelines using Infrastructure as Code and GitOps principles.
Implement observability solutions using Prometheus, Grafana, and OpenTelemetry for performance tuning and reliability.
VerTALENTS is a subsidiary of VerSprite Cybersecurity, specializing in technology staffing. The company connects top technical talent with industry clients through various methods, adding value to both clients and candidates for full-time and contracting opportunities.
Build and operate the self-service infrastructure platform for developers and AI agents.
Own core platform layers including CI/CD, GitOps, IaC module catalog, and golden-path scaffolding.
Build internal tooling, observability, and metrics to make pipelines observable and improvable.
Luxury Presence is building the AI growth platform for real estate. Backed by top investors like Bessemer Venture Partners, we're a Series C company with over $100M in ARR and more than 90,000 real estate professionals using our platform.
Co-own the architecture of cloud infrastructure on Azure and Kubernetes clusters for high throughput and availability.
Drive resilience strategy for global scaling, zero-downtime deployments, and disaster recovery.
Evolve observability stack with LGTM (Loki, Grafana, Tempo, Mimir) and lead incident response.
Flip is an AI-powered employee experience platform for frontline workers in retail, manufacturing, and logistics. The company is a young, rapidly growing tech company with a remote-first culture and offices in Berlin and Stuttgart.
Architect and scale the cloud platform behind a mission-critical SaaS product used globally.
Lead Infrastructure as Code maturity and drive automation, reliability, and cost optimisation.
Own uptime, SLAs, and incident management practices while mentoring engineers.
Innocraft (trading as Matomo) provides an open-source analytics platform trusted by enterprises and governments for full data ownership. The company values diversity and inclusion, and operates with a stable, mature product and strong engineering team.
Own and evolve the cloud platform including compute layer, EKS fleet, serverless infrastructure, networking, and cloud operations across AWS and GCP.
Design and maintain infrastructure-as-code foundation and networking layer for reliability, security, and scalability.
Build AI-powered automation for cloud infrastructure management, including policy-as-code, drift detection, and LLM-assisted runbook generation.
Webflow builds the world's leading AI-native Digital Experience Platform, empowering teams to design, launch, and optimize for the web without barriers. As a remote-first company with over 2 million users across 190 countries, it fosters a culture of trust, transparency, and creativity.
Design and evolve cloud-native, containerized infrastructure for data products and services.
Lead cross-functional technical initiatives ensuring availability, security, scalability, and reliability.
Contribute hands-on expertise in systems design, automation, and high-scale distributed systems.
Visa is a world leader in payments technology, facilitating transactions across more than 200 countries. A large global company, Visa focuses on innovation and uplifting everyone, everywhere.
Architect and maintain infrastructure as code with Terraform.
Set up monitoring, alerting, and incident response.
We're a UK fintech building high-throughput digital infrastructure for the mortgage and property space. Recently acquired Trussle and we are taking our platform to the next level. The company values innovation and building high-quality products.