Cloud cost optimization – identify waste, drive rightsizing, build tooling and guardrails to prevent cost regressions.
Platform reliability and scalability – improve observability, define SLOs where they're missing, and harden the systems all of Stream's products depend on.
Architecture and infra evolution – evaluate and drive decisions on Kubernetes adoption, database architecture, and cloud provider strategy.
Maximize the velocity of our product engineering team.
Ensure platform scalability, reliability, and security.
Champion best practices and shape the engineering culture.
They are building a robust, scalable trading platform to serve high-traffic, latency-sensitive applications. They leverage state-of-the-art technologies to support real-time trading while providing unparalleled reliability and performance.
Own and drive the architectural direction for critical infrastructure platforms that support GitLab at global scale.
GitLab is the intelligent orchestration platform for DevSecOps. They enable organizations to increase developer productivity, improve operational efficiency, reduce security and compliance risk, and accelerate digital transformation. GitLab has a high-performance culture driven by their values.
Lead the Infrastructure Engineering team, taking full ownership of cloud infrastructure, Kubernetes platforms, DevOps tooling, and CI/CD pipelines.
Drive reliability, scalability, and security across the production environment while maintaining a sharp focus on developer velocity and business impact.
Mentor and guide engineers across SRE, DevOps, and Database Reliability functions, fostering a culture of operational excellence and pragmatic problem-solving.
Finom is a European tech startup headquartered in Amsterdam, revolutionizing financial services for entrepreneurs with an all-in-one B2B platform. They have raised $346 million, are expanding across key EU markets, and foster innovation, prioritizing research and solutions that benefit users, employees, partners, and the business.
Propel builds technology that strengthens the social safety net. They are a passionate team of ~100 Propellers who envision a future where every American has the tools and resources they need to thrive, offering a remote-first working environment with headquarters in Brooklyn.
Tech lead two teams (DevEx and Cloud Infrastructure) totaling 6–8 engineers: set technical direction, review key designs/changes, and raise engineering standards across both domains.
Own the delivery toolchain end-to-end (Git, CI, deployments/releases): reduce flakiness, improve build/test times, make releases repeatable with clear rollback, and drive adoption of org-wide standards through tooling, docs, and supported migrations.
Improve the software development lifecycle (setup → build/test → PR → deploy → observe) and standardize environments so teams spend less time on tooling and more time shipping.
Traackr is a global SaaS technology company providing a data-driven influencer marketing platform that marketers use to optimize investments, streamline campaigns, and scale programs. They are a remote-first company with offices in San Francisco, New York, Boston, Paris, and London and operate on a culture of mutual respect.
Enhance system monitoring with tools like Prometheus, Grafana, and ELK Stack, ensuring visibility and alignment with business objectives.
Transition manual processes to automated solutions using IaC tools (e.g., Terraform, Ansible) to streamline deployments and improve operational efficiency.
Improve pipeline architecture for fast, reliable releases, ensuring scalability and resilience to handle high volumes of changes.
Upsun (formerly Platform.sh) is a cloud application platform designed for hybrid teams, enabling developers, DevOps engineers, and platform teams to build, ship, and scale confidently without backend infrastructure hassles. Upsunners are a remote, global workforce committed to open source and an open, welcoming environment, valuing curiosity, knowledge, and innovative ideas.
Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement.
Participate in an on-call rotation and act as incident commander for high-severity production events.
Partner with engineering teams to build reliability into new features before they ship to production
Akuity helps enterprises ship software faster and more reliably with modern GitOps best practices. The Akuity Platform enables teams to manage the development and deployment across hundreds – if not thousands – of Kubernetes clusters from a single control plane.
Define long-term architectural strategy for multi-cloud compute and traffic platforms.
Provide mentorship to engineers through design reviews and code contributions.
Partner with Security to build ‘secure by default’ systems.
Temporal Technologies develops an open-source programming model that simplifies code and enhances application reliability. With a focus on developer experience and open-source software, they foster a culture of curiosity, collaboration, and genuine impact.
Dive into client environments to explore application workloads, infrastructure dependencies, and security controls.
Aid in the design and implement migration strategies to reduce risks and unlock automation opportunities.
Develop scalable and secure infrastructure using Infrastructure as Code (IaC) tools.
Kunai builds full-stack technology solutions for banks, credit and payment networks, infrastructure providers, and their customers. At Kunai, they help their clients modernize, capitalize on emerging trends, and evolve their business for the coming decades by remaining tech-agnostic and human-centered.
Partner with engineering leadership, EMs, and Product Managers to define and deliver AI products.
Architect scalable, high-performance systems that support a growing number of AI-powered products.
Drive technical strategy and make architectural decisions that compound - enabling the team to ship more AI experiences faster.
Webflow is building the world’s leading AI-native Digital Experience Platform as a remote-first company built on trust, transparency, and a whole lot of creativity. They empower teams to design, launch, and optimize for the web without barriers, from entrepreneurs launching their first idea to global enterprises scaling their digital presence.
Implement SLI/SLO frameworks with error budgets to drive reliability decisions
Design release strategies including blue/green deployments and version tracking
Lead incident response and develop automated runbooks to reduce MTTR
Jobgether is a company that helps connect individuals with jobs through an AI-powered matching process. They ensure applications are reviewed quickly, objectively, and fairly against roles' core requirements.
Own and evolve the uptime monitoring platform to enhance customer capabilities.
Deploy a Clickhouse instance to capture check run logs and design APIs for reporting.
Collaborate with customers to resolve bugs affecting their infrastructure.
Jobgether is a platform posting jobs on behalf of partner companies. We use AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements.
We are looking for a FinOps engineer that packs the technical chops of an SRE, but brings experience with cloud cost management & capacity planning.
Someone technical enough that engineers trust their architectural advice, but commercially minded enough to partner with Finance and explain the why behind our spend.
We need proactive people that can fully own projects and get them done, and know to get help when needed.
PostHog provides every product that companies need to run their business, from their first day to the day they IPO and beyond. They are product-led, default alive, and well-funded, with a focus on building an awesome product for end-users and hiring exceptional teammates.
Design and implement the complex distributed infrastructure that powers our core AI engine and distributed analysis systems.
Tune and optimize cloud services across compute, storage, networking, and observability to drive performance and reliability.
Develop our core services, written in TypeScript, Kotlin and Go to support our unique deployment and infrastructure requirements.
XBOW is building the future of offensive security. They create the platform that puts security ahead in the arms race, using AI to autonomously discover, validate, and exploit vulnerabilities. Founded by Oege de Moor, the company is backed by Sequoia, Altimeter, and other leading investors.
Design, develop, and maintain core cloud platform services including compute orchestration, resource management, multi-tenancy, and API gateway components using Go, Java, Python, or Rust.
Build and optimize RESTful/gRPC APIs and microservices that support cloud resource provisioning, lifecycle management, and monitoring on Bitdeer AI Cloud.
Develop scalable, fault-tolerant distributed systems that handle high-throughput workloads across multi-region deployments.
Bitdeer is a world-leading technology company for Bitcoin mining and AI cloud. They are committed to providing comprehensive Bitcoin mining solutions for its customers, designing industry-leading ASIC chips, and manufacturing mining rigs, with operations globally and a diversified 3 GW energy portfolio.
Define and evolve reliability standards for the SmarterDx platform.
Enhance observability systems (metrics, logs, traces, alerting) to provide actionable insights and reduce mean time to detect (MTTD) and resolve (MTTR).
Reduce operational toil through automation, self-healing systems, and improved deployment and rollback mechanisms.
SmarterDx, a Smarter Technologies company, builds clinical AI that is transforming how hospitals translate care into payment. Founded by physicians in 2020, their platform connects clinical context with revenue intelligence, helping health systems recover millions in missed revenue, improve quality scores, and appeal every denial.
Design, build, and manage our cloud infrastructure using modern tools (Pulumi) to ensure all infrastructure changes are reproducible, secure, and easily auditable.
Orchestrate and optimize our Kubernetes clusters for complex, compute-heavy AI workloads, guaranteeing maximum efficiency and fault tolerance.
Implement a flawless monitoring setup using Datadog and OpenTelemetry to make the black box of our distributed systems transparent, hunting down latency spikes or bottlenecks before they impact users.
Deepslate is building Speech to Speech Voice AI models that sound and act indistinguishable from a human, with the belief that everyone should be able to use it. Backed by top-tier investors from the Tech and AI sectors, we are incredibly well-funded and moving fast.
Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.
Ensure reliability of multi-cloud Kubernetes clusters and pipelines.
Focus on automation so we can spend energy where it matters.
Cresta is on a mission to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Their platform combines the best of AI and human intelligence to help contact centers discover customer insights and behavioral best practices.
Define and execute the reliability engineering roadmap.
Establish SLO/SLI/error budget frameworks for system stability.
Drive continuous improvement through DORA metrics and analysis.
Jobgether leverages AI for HR solutions. They focus on connecting talent with opportunities, using AI-driven matching to ensure fair and objective application reviews.