Ensure the reliability, resilience, and safe operations of Qonto’s critical storage systems.
Deliver concrete improvements on disaster recovery readiness, safe upgrades, alerting, and capacity planning with visible impact on our production.
Drive a platform engineering mindset by building automation, tooling, and APIs to improve the developer experience and prepare our infrastructure for the future of AI-operated systems.
Design and build the core data infrastructure powering Vantage's platform.
Own architecture decisions for systems built on ClickHouse, Temporal, Kubernetes, and Postgres.
Drive reliability, performance, and scalability initiatives across the platform as data volume and customer load grows
Vantage is the FinOps platform built for modern engineering teams. They are a high-output team of ~50 employees based in New York City with a remote-friendly culture.
Operate and evolve multi-cloud streaming clusters and related database infrastructure, diagnosing and eliminating cross-layer failure modes.
Define and evolve the technical direction for operating shared database systems at scale, leading complex initiatives and reliability investments.
Mentor and support engineers, improve systems toil with automation, and partner with database and platform teams to align on strategy.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, featuring scalable metrics, logs, and traces and thrive in an innovation-driven environment where transparency, autonomy, and trust fuel everything.
Support the availability and durability of critical services across production environments.
Develop automation for common operational tasks, reducing manual intervention and toil.
Partner with engineering, product, and operations teams to support resilient system design and operations.
Backblaze is the object storage leader in the open cloud movement, fueling customer success with cloud storage built purposefully to unlock budgets and unleash innovators. Founded in 2007, they scaled the business with less than $3 million in outside funding until 2021, and generate over $100m in revenue managing over three billion gigabytes of data storage for 500K+ customers in 175+ countries.
Operating and evolving 100+ multi-cloud streaming clusters and related database infrastructure.
Diagnosing and eliminating cross-layer failure modes.
Designing safe upgrade and rollout strategies at scale.
Grafana Labs is a remote-first, open-source powerhouse with over 20M users of Grafana, its open source visualization tool. Grafana Labs helps more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, and its team thrives in an innovation-driven environment.
Leading a team focused on designing, building, and evolving cloud-native, containerized infrastructure.
Driving complex technical initiatives and ensuring the availability, security, scalability, and reliability of our data ecosystem.
Guiding and developing engineering talent, setting priorities, driving execution, and partnering across teams.
Pismo, founded in 2016, provides a comprehensive processing platform for banking, card issuing and financial market infrastructure. Pismo has 500+ employees located in more than 10 countries around the world and was acquired by Visa in 2024.
Manage and support infrastructure for Growth teams, including Nomad, Hashistack, databases, and any other underlying systems
Maintain and troubleshoot GitLab CI pipelines, ensuring reliable and fast build, test, and deployment cycles
Provide operational support across Onboarding, Acquire, and Engage teams, helping debug issues in staging and production environments
Kraken is a mission-focused company rooted in crypto values, aiming to accelerate the global adoption of crypto, so that everyone can achieve financial freedom and inclusion. As a fully remote company, they have Krakenites in 70+ countries who speak over 50 languages.
Design, build, and maintain scalable, highly available and fault-tolerant infrastructures.
Implement and improve monitoring, alerting, and incident response systems to ensure optimal system performance and minimize downtime.
Drive continuous improvement in infrastructure automation, deployment, and orchestration.
Mistral AI is dedicated to democratizing AI through high-performance, optimized, open-source models, products, and solutions designed to integrate seamlessly into daily working life. They are a dynamic, collaborative team passionate about AI and its potential to transform society dedicated to innovation.
Working with engineers across Yelp in supporting new features and services.
Integrating tools to monitor platform stability and performance.
Help scale our Kubernetes clusters and AWS-based infrastructure while maintaining our platform's SLOs.
Yelp's engineering culture values individual authenticity and encourages creative solutions. They focus on helping users, growing as engineers, and having fun in a collaborative environment.
Lead efforts to improve system reliability, scalability, and performance across critical services
Define and implement SLIs/SLOs and error budgets, and use them to guide engineering priorities
Design and develop observability systems (metrics, logging, tracing, alerting) that produce actionable alerts and data with minimal noise
UJET is an AI-powered contact center innovation company, delivering a cloud platform that redefines the customer experience. They are built on a cloud-native architecture and partner with businesses to deliver exceptional interactions and accelerated growth in the AI-driven world.
Design and implement infrastructure and tools that empower our product teams to rapidly and securely iterate, emphasizing reliability and automation.
Influence the strategic direction of our infrastructure and operational practices, ensuring that we are well-positioned to scale and support our growing organization.
Take a proactive role in the resolution of production issues, ensuring that we are well-prepared to handle incidents and that we learn from them in a blameless manner.
SSV Labs is the core team behind the SSV Network - pioneering decentralized infrastructure for Ethereum staking. They are building tools, protocols, and standards to make staking more secure, scalable, and trustless.
Act as a primary or escalation responder in a 24x7 on‑call rotation
Automate repetitive operational tasks to reduce manual toil
Support and troubleshoot: Linux‑based systems Cloud platforms (AWS, Azure, GCP)
NiCE Ltd. software products are used by 25,000+ global businesses, including 85 of the Fortune 100 corporations, to deliver extraordinary customer experiences, fight financial crime and ensure public safety. NiCE is consistently recognized as the market leader in its domains, with over 8,500 employees across 30+ countries and recognized as an innovation powerhouse that excels in AI, cloud and digital.
Build Self-Service Infrastructure: Design and scale highly available Infrastructure as Code (IaC) modules using Terraform. Empower development teams to provision resources autonomously and securely.
Champion Platform Reliability: Partner closely with engineering teams to define, measure, and operationalize SRE metrics. Balance feature velocity with system stability.
Elevate Developer Experience (DevEx): Architect frictionless, GitOps-driven CI/CD pipelines utilizing GitHub Actions and ArgoCD. Facilitate automated, secure, and progressive deployments.
KTO Group drives excitement in iGaming through innovation, focusing on transparency and player satisfaction. Founded in 2018, KTO blends sports betting with online casino entertainment on a proprietary platform, and is a rising leader in LATAM, ranked among Brazil’s top 10 iGaming brands.
Support the underlying infrastructure to fight financial crime.
Work with a small, passionate, and experienced team.
Influence the overall architecture and hosting strategy.
Hummingbird is a remote-first company united by the mission of fighting financial crime. They are customer-obsessed and love building SaaS products, with a culture that values diverse opinions and supports employees.
Develop and maintain observability solutions using platforms like Datadog, Prometheus and Grafana
Take a leading role in incident management, including coordinating response efforts, troubleshooting issues, and identifying follow-up actions
Partner with product engineering teams to architect reliable systems, recover from incidents, and learn from mistakes
Ditto is redefining how data moves at the edge, aiming to make resilient, real-time applications seamless for developers, regardless of network conditions. It's a globally distributed and fast-growing startup with over $145 million in funding that is committed to building a diverse and inclusive team.
Design, implement, and operate highly available PostgreSQL clusters.
Optimize query performance and indexing strategies.
Build and maintain automation for deployment tasks.
Wavelo provides flexible software that modernizes how communication service providers (CSPs) do business, helping them drive more value, focus on customer experience, and scale their operations faster. As part of Tucows, Wavelo is backed by outstanding resources and talent, embracing a people-first philosophy rooted in respect, trust, and flexibility.
Design, build, and operate critical data infrastructure platforms.
Ensure high availability, scalability, and performance of platform services.
Drive improvements in developer productivity through automation and tooling.
Angi helps homeowners connect with reliable professionals for home service projects. With over 2,800 employees worldwide, they foster an environment where homeowners, pros, and employees benefit from streamlined services.
Drive technical direction across integration platform + delivery work, with clear trade-offs and pragmatic sequencing.
Define and raise engineering quality standards for integrations (testing expectations, reliability/monitoring, security, logging), including for work done by external developers/partners.
Lead initiatives that enable scale, like reusable patterns/tooling (e.g., IDK) and frameworks that reduce one-off implementation work.
Pleo builds spend solutions that make managing money seamless, empowering, and surprisingly effective for finance teams and employees alike - with a vision to help all businesses ‘go beyond’. They are a driven, progressive, and kind bunch of 850+ people from over 100 nationalities, all committed to delivering the future of business spending, together.
Manage and support hybrid-cloud infrastructure for the Payward Services business unit, including Nomad, Kubernetes, and databases.
Build automation tooling, maintain CI/CD pipelines, and consult on monitoring and alerting best practices to ensure service reliability.
Provide operational support, participate in incident response, and debug complex distributed system issues across production and staging environments.
Kraken is a mission-focused company building premium crypto products for traders and institutions, dedicated to accelerating global crypto adoption for financial freedom. It is a fully remote company with a global team of industry pioneers spread across 70+ countries, operating with a strong crypto ethos and commitment to security and education.
Own the SRE roadmap end-to-end, setting priorities independently and driving execution to make the team's impact visible across the organization.
Drive compliance, security, and infrastructure topics for your business unit by identifying risks early and owning the resolution before they escalate.
Lead a 4–6-person generalist SRE team through 1:1s, performance cycles, and meaningful career development while contributing technical credibility to architectural discussions.
Kraken is a mission-focused cryptocurrency exchange building the future of crypto and blockchain technology. It is a fully remote company with employees in over 70 countries, offering premium crypto products and services for traders and institutions while emphasizing security, education, and client support.
Lead technical strategy and architecture for agentic workflows and Self Service Account Management features, ensuring safety, auditability, and scalability.
Act as a force-multiplier by setting engineering standards, conducting rigorous design reviews, and coordinating delivery across Product, Operations, and partner teams.
Own operational readiness including observability, incident playbooks, and post-launch validation to minimize reliability and compliance risks.
Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without hidden fees or interest. They are a remote-first company with competitive benefits, emphasizing a people-first culture and providing full premium coverage for employees and dependents.