Manage and support infrastructure for Growth teams, including Nomad, Hashistack, databases, and any other underlying systems
Maintain and troubleshoot GitLab CI pipelines, ensuring reliable and fast build, test, and deployment cycles
Provide operational support across Onboarding, Acquire, and Engage teams, helping debug issues in staging and production environments
Kraken is a mission-focused company rooted in crypto values, aiming to accelerate the global adoption of crypto, so that everyone can achieve financial freedom and inclusion. As a fully remote company, they have Krakenites in 70+ countries who speak over 50 languages.
Design, build, and maintain scalable, highly available and fault-tolerant infrastructures.
Implement and improve monitoring, alerting, and incident response systems to ensure optimal system performance and minimize downtime.
Drive continuous improvement in infrastructure automation, deployment, and orchestration.
Mistral AI is dedicated to democratizing AI through high-performance, optimized, open-source models, products, and solutions designed to integrate seamlessly into daily working life. They are a dynamic, collaborative team passionate about AI and its potential to transform society dedicated to innovation.
Develop and maintain observability solutions using platforms like Datadog, Prometheus and Grafana
Take a leading role in incident management, including coordinating response efforts, troubleshooting issues, and identifying follow-up actions
Partner with product engineering teams to architect reliable systems, recover from incidents, and learn from mistakes
Ditto is redefining how data moves at the edge, aiming to make resilient, real-time applications seamless for developers, regardless of network conditions. It's a globally distributed and fast-growing startup with over $145 million in funding that is committed to building a diverse and inclusive team.
Own the SRE roadmap end-to-end, setting priorities independently and driving execution to make the team's impact visible across the organization.
Drive compliance, security, and infrastructure topics for your business unit by identifying risks early and owning the resolution before they escalate.
Lead a 4–6-person generalist SRE team through 1:1s, performance cycles, and meaningful career development while contributing technical credibility to architectural discussions.
Kraken is a mission-focused cryptocurrency exchange building the future of crypto and blockchain technology. It is a fully remote company with employees in over 70 countries, offering premium crypto products and services for traders and institutions while emphasizing security, education, and client support.
Operating and evolving 100+ multi-cloud streaming clusters and related database infrastructure.
Diagnosing and eliminating cross-layer failure modes.
Designing safe upgrade and rollout strategies at scale.
Grafana Labs is a remote-first, open-source powerhouse with over 20M users of Grafana, its open source visualization tool. Grafana Labs helps more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, and its team thrives in an innovation-driven environment.
Operate and improve platform tools so product teams can ship reliably, triaging tickets, fixing build issues, and handling routine service requests.
Maintain and extend self-service workflows by updating docs, examples, and guardrails under guidance from senior engineers.
Perform day-to-day Kubernetes operations: deploy/update Helm charts, manage namespaces, diagnose rollout issues, and follow runbooks for incident response.
ISHIR is a digital innovation and enterprise AI services provider. They work with startups and enterprises to shape the future through accelerated innovation, deep technical expertise, access to global digital talent and a passion for complex problem-solving. ISHIR attracts proactive individuals who thrive on challenges and promote self-reliance, open communication, and collaboration.
Ensure the reliability of our critical products and services by meeting or exceeding SRE objectives.
Instantiate and maintain production infrastructure using Infrastructure as Code and Configuration Management tools.
Automate deployments, administration, and monitoring of our services by following CI/CD practices.
Sectigo delivers certificate lifecycle management (CLM) solutions that secure human and machine identities. They are one of the largest CAs with over 700,000 customers and strive to delight their customers and become the market leader in their industry.
Design and implement infrastructure and tools that empower our product teams to rapidly and securely iterate, emphasizing reliability and automation.
Influence the strategic direction of our infrastructure and operational practices, ensuring that we are well-positioned to scale and support our growing organization.
Take a proactive role in the resolution of production issues, ensuring that we are well-prepared to handle incidents and that we learn from them in a blameless manner.
SSV Labs is the core team behind the SSV Network - pioneering decentralized infrastructure for Ethereum staking. They are building tools, protocols, and standards to make staking more secure, scalable, and trustless.
Act as a primary or escalation responder in a 24x7 on‑call rotation
Automate repetitive operational tasks to reduce manual toil
Support and troubleshoot: Linux‑based systems Cloud platforms (AWS, Azure, GCP)
NiCE Ltd. software products are used by 25,000+ global businesses, including 85 of the Fortune 100 corporations, to deliver extraordinary customer experiences, fight financial crime and ensure public safety. NiCE is consistently recognized as the market leader in its domains, with over 8,500 employees across 30+ countries and recognized as an innovation powerhouse that excels in AI, cloud and digital.
Lead efforts to improve system reliability, scalability, and performance across critical services
Define and implement SLIs/SLOs and error budgets, and use them to guide engineering priorities
Design and develop observability systems (metrics, logging, tracing, alerting) that produce actionable alerts and data with minimal noise
UJET is an AI-powered contact center innovation company, delivering a cloud platform that redefines the customer experience. They are built on a cloud-native architecture and partner with businesses to deliver exceptional interactions and accelerated growth in the AI-driven world.
Support the availability and durability of critical services across production environments.
Develop automation for common operational tasks, reducing manual intervention and toil.
Partner with engineering, product, and operations teams to support resilient system design and operations.
Backblaze is the object storage leader in the open cloud movement, fueling customer success with cloud storage built purposefully to unlock budgets and unleash innovators. Founded in 2007, they scaled the business with less than $3 million in outside funding until 2021, and generate over $100m in revenue managing over three billion gigabytes of data storage for 500K+ customers in 175+ countries.
Design, implement, and operate cloud-native infrastructure for production workloads.
PointClickCare's mission is to help providers deliver exceptional care. They are a leading health tech company that’s founder-led and privately held that empowers their employees to push boundaries, innovate, and shape the future of healthcare. They have the largest long-term and post-acute care dataset and a Marketplace of 400+ integrated partners, their platform serves over 30,000 provider organizations.
Collaborate with stakeholders to drive best practices for monitoring, CI/CD pipelines
Troubleshoot deployment issues in our CI pipeline
Identify areas for automation and embrace the codification of all things
Weedmaps is a global leader in the cannabis industry. They are dedicated to transparency, education, and community, serving cannabis to consumers and businesses in the U.S. and worldwide.
Operate and evolve multi-cloud streaming clusters and related database infrastructure, diagnosing and eliminating cross-layer failure modes.
Define and evolve the technical direction for operating shared database systems at scale, leading complex initiatives and reliability investments.
Mentor and support engineers, improve systems toil with automation, and partner with database and platform teams to align on strategy.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, featuring scalable metrics, logs, and traces and thrive in an innovation-driven environment where transparency, autonomy, and trust fuel everything.
Building tools and applications to extends Calendly’s infrastructure platform
Evaluating and deploying cloud native open source tools
Exercising expertise in cloud infrastructure concepts and patterns
Calendly's product powers connections for millions through impactful innovation. They are in the midst of exciting growth and desire people that want to learn, grow, and do their best work.
Design, develop, and support cloud and locally hosted solutions that facilitate ease of service deployment, availability, and operations
Continuously improve processes and infrastructure to be easy to deploy, scalable, secure, and fault-tolerant
Automate operational, testing, installation, and other processes to increase efficiency and stability
Zimperium is the world leader in mobile security, purpose-built to protect the modern mobile enterprise. Trusted by leading organizations and governments, their AI-driven platform delivers real-time, on-device protection for mobile applications and devices.
Provide production support on a shift according to the team on-call roster.
Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support.
Continuously monitor the health and performance of our services, systems, and infrastructure.
Granicus is driven by the excitement of building, implementing, and maintaining technology that is transforming the Govtech industry by bringing governments and its constituents together. They have served 5,500 federal, state, and local government agencies and more than 300 million citizen subscribers.
Contribute to the design and evolution of hybrid infrastructure systems.
Build and enhance internal tools and automation to improve scalability.
Partner with Dev, DevOps, and QA teams to resolve infrastructure or deployment blockers.
DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers. DDN's cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data.
Partner with product and platform engineering teams to improve system reliability, scalability, and developer experience
Build, maintain, and evolve CI/CD pipelines to support safe, fast, and reliable deployments
Improve observability through better monitoring, alerting, logging, and telemetry
Zipline is a SaaS company transforming how frontline teams work. They empower leading brands across retail, healthcare, logistics, and beyond. Zipline is a fully remote company with employees across the U.S., Canada, and around the globe.
Architect and scale AWS infrastructure, including container orchestration and observability platform development.
Lead infrastructure builds for compliance (SOC 2, HIPAA) and harden container workloads across environments.
Own the shared infrastructure stack, CI/CD pipelines, and reliability practices including SLOs and incident response.
Truv is transforming the financial data industry with a secure, real-time API platform for payroll account access, streamlining income verification and direct deposit switching. It is a well-funded, innovative startup backed by top investors, with a leadership team from companies like Apple, Carta, and Venmo.