Oversee a specialized SRE team focused on the design, deployment, and maintenance of automation toolsets.
Establish and enforce standards for IaC to ensure consistent, repeatable, and secure deployments.
Drive the automated lifecycle of both physical and virtual assets, from initial template creation/deployment to automated patching, scaling, and decommissioning.
Galaxy is a global leader in digital assets and data center infrastructure, delivering solutions that accelerate progress in finance and artificial intelligence. Led by CEO and Founder Michael Novogratz, their team blends deep crypto expertise with institutional experience and a shared commitment to shaping the future of Web3 and AI.
Assess and improve visibility by identifying gaps in dashboards, metrics, and logs.
Refine alerts and dashboards for critical services to catch issues earlier.
Automate routine checks and monitoring tasks to free up engineers.
PlayOn is where high school sports come to life through platforms like GoFan, NFHS Network, and MaxPreps. As a growth-stage company backed by KKR, we build the technology that powers high school athletics from ticketing and streaming to fundraising and merchandise.
Own and operate end-to-end infrastructure for backend services, frontend systems and databases.
Build and maintain reliable deployment workflows including CI/CD pipelines and rollback procedures.
Improve system-wide observability through metrics, logging, alerting, and monitoring to ensure uptime.
Jito Labs builds a high-performance trading terminal on Solana. They are a lean, high-output team building something that sits at the intersection of execution quality, user experience, and on-chain infrastructure.
Design, build, and maintain infrastructure using Infrastructure as Code tools such as Terraform.
Improve system reliability, scalability, resilience, and performance across the Mast platform.
Build systems and tooling that automate infrastructure management and operational workflows wherever possible.
Mast is on a mission to make complex lending simple by building modern, cloud-native lending technology purpose-built for specialist lenders. It is a high-performance team of engineers and lending experts that values radical honesty, transparency, and speed.
Design, build, and maintain scalable, reliable systems on GCP.
Develop automation for infrastructure provisioning using Terraform, Ansible, or Deployment Manager.
Manage incident response, conduct postmortems, and implement improvements to reduce recurrence.
SupplyHouse.com is an industry-leading e-commerce company specializing in HVAC, plumbing, heating, and electrical supplies since 2004. They value every individual team member and cultivate a community where people come first with Generosity, Respect, Innovation, Teamwork, and GRIT.
Create, deploy, and manage high performing servers.
Deliver millions of requests globally with sub-second latency.
Shape something from the core, without legacy infrastructure.
Entefy is working to create the fastest data syncing experience ever built. They hold their data syncing, consistency and uptime to the highest standards and are looking for someone to manage high performing servers.
Provide production support on a shift according to the team on-call roster.
Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support.
Continuously monitor the health and performance of our services, systems, and infrastructure.
Granicus builds and maintains technology that is transforming the Govtech industry by bringing governments and its constituents together. They serve 5,500 federal, state, and local government agencies and more than 300 million citizen subscribers, and are known for being one of the best companies to work for.
Build and improve scalable infrastructure operations processes that support a growing cloud platform.
Enable customer-facing and operational teams with secure automation, diagnostics, tooling and clear workflows.
Reduce repeatable manual work by identifying operational pain points and turning them into automated or self-service solutions.
NexGen Cloud delivers on-demand and private GPU infrastructure to a wide array of customers. They're a tight-knit, fast-moving team working at the cutting edge of AI cloud infrastructure, equipping their people with AI at every level.
Build internal tooling to help other engineers and the rest of the company understand and operate our system.
Design and implement security best practices for our team and infrastructure.
Reduce toil through automation, including building and maintaining CI/CD infrastructure.
Openly is rebuilding insurance from the ground up by re-envisioning and enhancing every aspect of the customer experience. They are a rapidly growing team of exceptional, curious, empathetic people with a wide range of skill sets, spanning many departments.
Own the technical direction of Remote's SRE/Platform domain.
Define and drive the reliability strategy across the platform.
Identify and lead AI enablement initiatives across the engineering organisation.
Remote is solving modern organizations’ biggest challenge – navigating global employment compliantly with ease. With our core values at heart and a future-focused work culture, our team works tirelessly on ambitious problems, asynchronously, around the world.
Performing day-to-day operational/DevOps tasks on Wikimedia’s public facing infrastructure.
Implementing and utilizing configuration management and deployment tools.
Leading continuous improvement, by automating the installation, configuration and maintenance of services on our platform.
The Wikimedia Foundation operates Wikipedia and other Wikimedia free knowledge projects with the vision of a world where every single human can freely share in the sum of all knowledge. As a charitable, not-for-profit organization, it relies on donations and has staff members based in 40+ countries.
Lead Onboarding end‑to‑end and extend with additional use cases.
Own a small portfolio of customer account and act as a trusted technical partner all year.
Provide technical support and communicate crisply with customers throughout.
OpsMill is building the next generation of infrastructure data management, focusing on helping automation teams unify data and scale automation reliably. As a commercial open-source company, they are practitioners who understand the real-world challenges of scaling infrastructure automation.
Build and maintain CI/CD pipelines and deployment infrastructure.
Leverage AI to automate analysis and resolution of production issues.
Fal is the generative media ecosystem powering the next generation of AI products. They build the infrastructure, tools, and model access that teams need to move from idea to production.
Improve the reliability, performance, and scalability of our production platform.
Operate reliable infrastructure, improve observability, and drive incident response.
Use data-driven reliability practices such as SLIs, SLOs, SLAs, and DORA metrics.
VRChat is a game-changing platform that provides an endless collection of social VR experiences. They empower their community to bring their imaginations to life and help shape the metaverse. Their team includes people from Netflix, Twitter, Meta, and Microsoft.
Design and operate our Kubernetes ecosystem with a focus on high availability and zero-downtime operations.
Own and evolve our PaaS strategy, using GitOps and CI/CD to empower domain teams to deploy independently.
Define and implement our observability strategy across metrics, logs, and tracing.
Finom is a European tech startup headquartered in Amsterdam, revolutionizing financial services for entrepreneurs. They offer an all-in-one financial B2B solution integrating banking, accounting, financial management, and invoicing into a mobile-first platform, with about 346 million in funding.
Own and evolve CI/CD pipelines using GitHub Actions and OIDC-based authentication for microservices and agentic workloads.
Automate infrastructure provisioning using Infrastructure as Code tools such as Terraform and CloudFormation.
Operate and scale our Kubernetes platform, including autoscaling, ingress, and multi-tenant isolation for enterprise customers.
Zingtree is a next-generation intelligent process automation platform reimagining customer experience operations for enterprise support leaders. It is a small team with high ownership, emphasizing automation, collaboration, and transparency.
Design systems with resilience, graceful degradation, and capacity in mind.
Define and measure SLOs and SLIs that actually reflect what our customers feel.
Use Datadog (logging, metrics, APM) together with CloudWatch to build signal-heavy, noise-light observability.
EarnIn is building products that deliver real-time financial flexibility for those with the unique needs of living paycheck to paycheck. They are growing fast and are excited to continue bringing world-class talent onboard to help shape the next chapter of their growth journey.
Architect future iterations of core systems, addressing scaling requirements.
Design and implement developer tools to enhance deployment safety and reproducibility.
Drive excellence in monitoring and guide incident response for quick issue resolution.
Found provides tools for self-employed individuals, offering a business bank account that automates taxes and expense tracking. They aim to give self-employed people the security and peace of mind historically available only at large corporations and are looking for kind, resourceful, and passionate people.
Design and implement secure, scalable infrastructure in Azure, integrating security best practices.
Partner with the infrastructure team to enhance the reliability and performance of systems.
Lead security incident response efforts within the Azure ecosystem and automate responses.
Mesh's mission is to enable consumers to pay and be paid with any asset, bridging the gap by making crypto payments reliable and ubiquitous. Backed by leading investors and combining a powerful orchestration engine with a seamless consumer app to unlock liquidity for the world.
Build and maintain end-to-end observability with ELK, Prometheus, and Grafana.
Own and improve CI/CD pipelines (CircleCI, GitLab CI, GitHub Actions, ArgoCD).
Lead incident response and postmortems in a blameless culture.
Redcare Pharmacy is Europe’s No.1 e-pharmacy, powered by passionate teams and cutting-edge innovation. They strive to create a healthy, collaborative work environment where every employee feels valued and inspired to contribute to their vision “Until every human has their health”.