Implementing the improvements to the reliability, fault tolerance, scalability, and performance of our infrastructure
Managing incidents using your technical know-how to involve the appropriate teams and automate away manual practices
Improving observability across our systems (metrics, logs, tracing) to reduce time to detection and resolution
Newton is changing how Canadians trade crypto with the goal to make financial freedom achievable for everyone by giving their customers the tools and knowledge needed to navigate the crypto world. They are a remote team spread across Canada that values pushing boundaries and getting things done.
Develop and maintain observability solutions using platforms like Datadog, Prometheus and Grafana
Take a leading role in incident management, including coordinating response efforts, troubleshooting issues, and identifying follow-up actions
Partner with product engineering teams to architect reliable systems, recover from incidents, and learn from mistakes
Ditto is redefining how data moves at the edge, aiming to make resilient, real-time applications seamless for developers, regardless of network conditions. It's a globally distributed and fast-growing startup with over $145 million in funding that is committed to building a diverse and inclusive team.
Building tools and applications to extends Calendly’s infrastructure platform
Evaluating and deploying cloud native open source tools
Exercising expertise in cloud infrastructure concepts and patterns
Calendly's product powers connections for millions through impactful innovation. They are in the midst of exciting growth and desire people that want to learn, grow, and do their best work.
Building tools and applications to extends Calendly’s infrastructure platform
Evaluating and deploying cloud native open source tools
Exercising expertise in cloud infrastructure concepts and patterns
Calendly makes it possible for their customers through impactful innovation. They have millions of users and are in the midst of exciting product growth.
Manage and support infrastructure for Growth teams, including Nomad, Hashistack, databases, and any other underlying systems
Maintain and troubleshoot GitLab CI pipelines, ensuring reliable and fast build, test, and deployment cycles
Provide operational support across Onboarding, Acquire, and Engage teams, helping debug issues in staging and production environments
Kraken is a mission-focused company rooted in crypto values, aiming to accelerate the global adoption of crypto, so that everyone can achieve financial freedom and inclusion. As a fully remote company, they have Krakenites in 70+ countries who speak over 50 languages.
Design, build, and maintain scalable, highly available and fault-tolerant infrastructures.
Implement and improve monitoring, alerting, and incident response systems to ensure optimal system performance and minimize downtime.
Drive continuous improvement in infrastructure automation, deployment, and orchestration.
Mistral AI is dedicated to democratizing AI through high-performance, optimized, open-source models, products, and solutions designed to integrate seamlessly into daily working life. They are a dynamic, collaborative team passionate about AI and its potential to transform society dedicated to innovation.
Own the strategy, execution, and continuous improvement of Filevine's site reliability and platform resilience.
Directly manage the prioritization for the teams responsible for keeping Filevine fast, stable, and available.
Drive measurable improvements in uptime, incident prevention, and release confidence across the platform.
Filevine is a Legal AI company delivering Legal Operating Intelligence for the future of legal work. They bring together data, documents, workflows, and teams into one unified platform and are ranked as one of the most innovative and fastest-growing technology companies in the country.
Provide production support on a shift according to the team on-call roster.
Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support.
Continuously monitor the health and performance of our services, systems, and infrastructure.
Granicus is driven by the excitement of building, implementing, and maintaining technology that is transforming the Govtech industry by bringing governments and its constituents together. They have served 5,500 federal, state, and local government agencies and more than 300 million citizen subscribers.
Support the availability and durability of critical services across production environments.
Develop automation for common operational tasks, reducing manual intervention and toil.
Partner with engineering, product, and operations teams to support resilient system design and operations.
Backblaze is the object storage leader in the open cloud movement, fueling customer success with cloud storage built purposefully to unlock budgets and unleash innovators. Founded in 2007, they scaled the business with less than $3 million in outside funding until 2021, and generate over $100m in revenue managing over three billion gigabytes of data storage for 500K+ customers in 175+ countries.
Own the architecture, development, and operation of scalable, secure, and fault-tolerant cloud services.
Drive technical design and architectural decisions for distributed systems, influencing patterns, standards, and long-term platform evolution.
Lead complex initiatives end-to-end, from design through deployment and ongoing optimization.
ExtraHop is a company focused on reinventing Network Detection and Response (NDR) to offer enterprises unparalleled visibility, context, and control against emerging threats. They integrate NDR with Network Performance Management (NPM), Intrusion Detection Systems (IDS), and forensics, providing a single, comprehensive solution.
Lead the design and implementation of scalable, secure, and resilient cloud infrastructure across AWS and Azure.
Drive the architectural vision and strategy, ensuring alignment with long-term business goals.
Take the lead on automating and accelerating SDLC processes by identifying bottlenecks.
Candidly flips the script on planning, borrowing, repaying, and saving for college and is a category leader with an AI-driven student debt and savings optimization platform. They partner with hundreds of top employers and have a fully remote, international team of 70+ including alumni from Google, UBS, and Twitter.
Lead efforts to scale and improve our infrastructure.
Develop and support internal team tooling.
Troubleshoot, debug and resolve issues as part of a shared on-call rotation.
Lillio, formerly HiMama, empowers early childhood educators through innovative tools. They are a Series B, private-equity backed company recognized as an industry leader and selected in 2025 by Time Magazine as one of the world's top EdTech companies.
Leading a team focused on designing, building, and evolving cloud-native, containerized infrastructure.
Driving complex technical initiatives and ensuring the availability, security, scalability, and reliability of our data ecosystem.
Guiding and developing engineering talent, setting priorities, driving execution, and partnering across teams.
Pismo, founded in 2016, provides a comprehensive processing platform for banking, card issuing and financial market infrastructure. Pismo has 500+ employees located in more than 10 countries around the world and was acquired by Visa in 2024.
Deliver a scalable internal infrastructure platform on public cloud environments.
Establish and evolve Kubernetes-based platform capabilities to support high-availability, production-grade workloads at scale.
Build a secure and reliable foundation that supports CI/CD pipelines and minimizes operational risk across engineering teams
Chainlink is the industry-standard oracle platform bringing the capital markets onchain and powering the majority of decentralized finance (DeFi). Since inventing decentralized oracle networks, Chainlink has enabled tens of trillions in transaction value and now secures the vast majority of DeFi.
Operating and evolving 100+ multi-cloud streaming clusters and related database infrastructure.
Diagnosing and eliminating cross-layer failure modes.
Designing safe upgrade and rollout strategies at scale.
Grafana Labs is a remote-first, open-source powerhouse with over 20M users of Grafana, its open source visualization tool. Grafana Labs helps more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, and its team thrives in an innovation-driven environment.
Design, build, and operate core cloud infrastructure across compute, storage, databases, and networking layers.
Own and improve the reliability, scalability, and security of Valon’s production systems as we scale to support major enterprise deployments.
Evaluate, adopt, and operationalize new infrastructure technologies (e.g., Vitess, Clickhouse, Redis) to meet evolving product and scale requirements.
Valon is building the AI-native operating system for regulated finance, starting with mortgage servicing. They are a Series C company backed by a16z, transforming industries that others have written off as too complex to innovate.
Learn platform infrastructure, developer tooling, and deployment patterns.
Own and drive at least one architecture decision that improves platform reliability.
Ship infrastructure improvements that measurably improve developer experience or platform stability.
Homebot is a homeownership platform for lenders and real estate, title & insurance agents that drives client retention and partner referrals. They maintain a clear focus on culture, engagement, and creating an environment where people are valued and can thrive.
Working with engineers across Yelp in supporting new features and services.
Integrating tools to monitor platform stability and performance.
Help scale our Kubernetes clusters and AWS-based infrastructure while maintaining our platform's SLOs.
Yelp's engineering culture values individual authenticity and encourages creative solutions. They focus on helping users, growing as engineers, and having fun in a collaborative environment.
Partner with product and platform engineering teams to improve system reliability, scalability, and developer experience
Build, maintain, and evolve CI/CD pipelines to support safe, fast, and reliable deployments
Improve observability through better monitoring, alerting, logging, and telemetry
Zipline is a SaaS company transforming how frontline teams work. They empower leading brands across retail, healthcare, logistics, and beyond. Zipline is a fully remote company with employees across the U.S., Canada, and around the globe.
Design and implement infrastructure and tools that empower our product teams to rapidly and securely iterate, emphasizing reliability and automation.
Influence the strategic direction of our infrastructure and operational practices, ensuring that we are well-positioned to scale and support our growing organization.
Take a proactive role in the resolution of production issues, ensuring that we are well-prepared to handle incidents and that we learn from them in a blameless manner.
SSV Labs is the core team behind the SSV Network - pioneering decentralized infrastructure for Ethereum staking. They are building tools, protocols, and standards to make staking more secure, scalable, and trustless.