Manage and support infrastructure for Growth teams, including Nomad, Hashistack, databases, and any other underlying systems
Maintain and troubleshoot GitLab CI pipelines, ensuring reliable and fast build, test, and deployment cycles
Provide operational support across Onboarding, Acquire, and Engage teams, helping debug issues in staging and production environments
Kraken is a mission-focused company rooted in crypto values, aiming to accelerate the global adoption of crypto, so that everyone can achieve financial freedom and inclusion. As a fully remote company, they have Krakenites in 70+ countries who speak over 50 languages.
Design and implement infrastructure and tools that empower our product teams to rapidly and securely iterate, emphasizing reliability and automation.
Influence the strategic direction of our infrastructure and operational practices, ensuring that we are well-positioned to scale and support our growing organization.
Take a proactive role in the resolution of production issues, ensuring that we are well-prepared to handle incidents and that we learn from them in a blameless manner.
SSV Labs is the core team behind the SSV Network - pioneering decentralized infrastructure for Ethereum staking. They are building tools, protocols, and standards to make staking more secure, scalable, and trustless.
Develop and maintain observability solutions using platforms like Datadog, Prometheus and Grafana
Take a leading role in incident management, including coordinating response efforts, troubleshooting issues, and identifying follow-up actions
Partner with product engineering teams to architect reliable systems, recover from incidents, and learn from mistakes
Ditto is redefining how data moves at the edge, aiming to make resilient, real-time applications seamless for developers, regardless of network conditions. It's a globally distributed and fast-growing startup with over $145 million in funding that is committed to building a diverse and inclusive team.
Design, implement, and operate cloud-native infrastructure for production workloads.
PointClickCare's mission is to help providers deliver exceptional care. They are a leading health tech company that’s founder-led and privately held that empowers their employees to push boundaries, innovate, and shape the future of healthcare. They have the largest long-term and post-acute care dataset and a Marketplace of 400+ integrated partners, their platform serves over 30,000 provider organizations.
Arista Networks is a data-driven, client-to-cloud networking company for large data center, campus, and routing environments. They have over $8 billion in revenue and value diversity of thought and perspectives, fostering an inclusive environment for creativity and innovation.
Support the availability and durability of critical services across production environments.
Develop automation for common operational tasks, reducing manual intervention and toil.
Partner with engineering, product, and operations teams to support resilient system design and operations.
Backblaze is the object storage leader in the open cloud movement, fueling customer success with cloud storage built purposefully to unlock budgets and unleash innovators. Founded in 2007, they scaled the business with less than $3 million in outside funding until 2021, and generate over $100m in revenue managing over three billion gigabytes of data storage for 500K+ customers in 175+ countries.
Ensure the availability, reliability, performance, and security of our SaaS platform
Lead infrastructure automation efforts using Infrastructure as Code and Configuration Management tools
Define and monitor SLAs/SLOs/SLIs, and drive service quality improvements
Remote People builds the infrastructure to power borderless teams. Their technology enables businesses to hire anyone anywhere compliantly at the push of a button. They are committed to building a global, diverse team representing different and varied backgrounds, perspectives, and experiences.
Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement.
Participate in an on-call rotation and act as incident commander for high-severity production events.
Partner with engineering teams to build reliability into new features before they ship to production
Akuity helps enterprises ship software faster and more reliably with modern GitOps best practices. The Akuity Platform enables teams to manage the development and deployment across hundreds – if not thousands – of Kubernetes clusters from a single control plane.
Building tools and applications to extends Calendly’s infrastructure platform
Evaluating and deploying cloud native open source tools
Exercising expertise in cloud infrastructure concepts and patterns
Calendly's product powers connections for millions through impactful innovation. They are in the midst of exciting growth and desire people that want to learn, grow, and do their best work.
Collaborate with stakeholders to drive best practices for monitoring, CI/CD pipelines
Troubleshoot deployment issues in our CI pipeline
Identify areas for automation and embrace the codification of all things
Weedmaps is a global leader in the cannabis industry. They are dedicated to transparency, education, and community, serving cannabis to consumers and businesses in the U.S. and worldwide.
Building tools and applications to extends Calendly’s infrastructure platform
Evaluating and deploying cloud native open source tools
Exercising expertise in cloud infrastructure concepts and patterns
Calendly makes it possible for their customers through impactful innovation. They have millions of users and are in the midst of exciting product growth.
Collaborate with application engineering teams on platform infrastructure.
Enhance observability and spearhead the adoption of SRE best practices.
Build and maintain reliable CI/CD pipelines, tooling, and infrastructure.
Rula strives to provide quality, evidence-based, compassionate mental healthcare and aims to create a world where mental health is no longer stigmatized. They are a remote-first company operating in most U.S. states, and are dedicated to having a culture of inclusion that supports their employees.
Design and implement the complex distributed infrastructure that powers our core AI engine and distributed analysis systems.
Tune and optimize cloud services across compute, storage, networking, and observability to drive performance and reliability.
Develop our core services, written in TypeScript, Kotlin and Go to support our unique deployment and infrastructure requirements.
XBOW is building the future of offensive security. They create the platform that puts security ahead in the arms race, using AI to autonomously discover, validate, and exploit vulnerabilities. Founded by Oege de Moor, the company is backed by Sequoia, Altimeter, and other leading investors.
Lead the Infrastructure Engineering team, taking full ownership of cloud infrastructure, Kubernetes platforms, DevOps tooling, and CI/CD pipelines.
Drive reliability, scalability, and security across the production environment while maintaining a sharp focus on developer velocity and business impact.
Mentor and guide engineers across SRE, DevOps, and Database Reliability functions, fostering a culture of operational excellence and pragmatic problem-solving.
Finom is a European tech startup headquartered in Amsterdam, revolutionizing financial services for entrepreneurs with an all-in-one B2B platform. They have raised $346 million, are expanding across key EU markets, and foster innovation, prioritizing research and solutions that benefit users, employees, partners, and the business.
Support and operate Legion’s AWS-based cloud platform and Kubernetes (EKS) environments.
Build and maintain infrastructure-as-code using Terraform.
Improve CI/CD pipelines to increase deployment safety and velocity.
Legion Technologies delivers the industry’s most innovative workforce management platform. The AI-driven Legion WFM platform maximizes labor efficiency and employee engagement. They are a remote, mission-driven team that embraces a collaborative, fast-paced, and entrepreneurial culture.
Build and deploy computing services and infrastructure in customer environments.
Clarify and surface requirements from ambiguous use cases defined by cross-functional stakeholders.
Improve reliability and scalability by resolving edge cases, studying failure modes, and writing tests.
Planet designs, builds, and operates the largest constellation of imaging satellites in history. They deliver an unprecedented dataset of empirical information via a revolutionary cloud-based platform to authoritative figures in commercial, environmental, and humanitarian sectors. Planet has a people-centric approach toward culture and community and it strives to iterate in a way that puts their team members first and prepares their company for growth.
Implementing the improvements to the reliability, fault tolerance, scalability, and performance of our infrastructure
Managing incidents using your technical know-how to involve the appropriate teams and automate away manual practices
Improving observability across our systems (metrics, logs, tracing) to reduce time to detection and resolution
Newton is changing how Canadians trade crypto with the goal to make financial freedom achievable for everyone by giving their customers the tools and knowledge needed to navigate the crypto world. They are a remote team spread across Canada that values pushing boundaries and getting things done.
Lead the push toward a modern, cloud-native organization by designing and managing scalable, resilient systems on AWS.
Own the Infrastructure as Code (IaC) strategy using Terraform, ensuring environments are repeatable, versioned, and stable.
Build and optimize high-velocity deployment pipelines using GitHub Actions, ArgoCD, and Helm to get code from "commit" to "production" seamlessly.
TrueML is undergoing a major platform rearchitecture, moving toward a fully cloud-native, modernized infrastructure. They seem to be a medium-sized company with a focus on innovation and providing engineers with the tools and data they need to make smart, impactful choices.
Operating and evolving 100+ multi-cloud streaming clusters and related database infrastructure.
Diagnosing and eliminating cross-layer failure modes.
Designing safe upgrade and rollout strategies at scale.
Grafana Labs is a remote-first, open-source powerhouse with over 20M users of Grafana, its open source visualization tool. Grafana Labs helps more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, and its team thrives in an innovation-driven environment.
Propel builds technology that strengthens the social safety net. They are a passionate team of ~100 Propellers who envision a future where every American has the tools and resources they need to thrive, offering a remote-first working environment with headquarters in Brooklyn.