Own and manage high‑priority, business‑critical customer escalations through resolution.
Actively triage incoming critical cases and review long‑running incidents to determine next actions and risk mitigation.
Deliver clear, concise, and executive‑level written and verbal communications throughout the incident lifecycle.
Blueprint Technologies is a technology solutions firm headquartered in Bellevue, Washington. They are unified by a shared passion for solving complicated problems, and leverage cutting-edge technology to create additional revenue streams and new lines of business for their clients.
Serve as the primary Incident Commander for critical security events.
Orchestrate response efforts across multiple teams.
Conduct post-incident reviews and drive improvements.
GitLab is the intelligent orchestration platform for DevSecOps. They enable organizations to increase developer productivity, improve operational efficiency, reduce security and compliance risk, and accelerate digital transformation. GitLab has more than 50 million registered users and is trusted by more than 50% of the Fortune 100*, which reflects a high-performance culture driven by their values and continuous knowledge exchange.
Own the strategy, execution, and continuous improvement of Filevine's site reliability and platform resilience.
Directly manage the prioritization for the teams responsible for keeping Filevine fast, stable, and available.
Drive measurable improvements in uptime, incident prevention, and release confidence across the platform.
Filevine is a Legal AI company delivering Legal Operating Intelligence for the future of legal work. They bring together data, documents, workflows, and teams into one unified platform and are ranked as one of the most innovative and fastest-growing technology companies in the country.
Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement.
Participate in an on-call rotation and act as incident commander for high-severity production events.
Partner with engineering teams to build reliability into new features before they ship to production
Akuity helps enterprises ship software faster and more reliably with modern GitOps best practices. The Akuity Platform enables teams to manage the development and deployment across hundreds – if not thousands – of Kubernetes clusters from a single control plane.
Provide technical expertise in the support of the Department of Veterans Affairs (VA) End User support and Operations Monitoring contract within Major Incident Management (MIM).
IT Concepts dba Kentro drives innovation, fosters professional growth, and positively impacts communities. They are a close community of experts that pride themselves on creating an environment defined by teamwork, dedication, and excellence.
Develop and maintain observability solutions using platforms like Datadog, Prometheus and Grafana
Take a leading role in incident management, including coordinating response efforts, troubleshooting issues, and identifying follow-up actions
Partner with product engineering teams to architect reliable systems, recover from incidents, and learn from mistakes
Ditto is redefining how data moves at the edge, aiming to make resilient, real-time applications seamless for developers, regardless of network conditions. It's a globally distributed and fast-growing startup with over $145 million in funding that is committed to building a diverse and inclusive team.
Own the technical relationship for a portfolio of monitoring company customers, acting as their trusted advisor and escalation point
Partner with customers to design, launch, and optimize call flows, including telephony routing, failover strategies, and integration patterns
Proactively monitor system performance, identify risks, and drive improvements to reliability, latency, and call success rates
RapidSOS is the leading public safety AI company that unlocks mission-critical intelligence for first responders and security teams – enabling faster, smarter and more accurate emergency response. They are in an exciting phase of growth, welcoming new members from across the globe to their mission-driven, ambitious, and inclusive team.
Provide strategic leadership for the CPE organization, spanning cloud infrastructure, platform services, and operational enablement.
Lead and develop CPE managers and senior technical leaders, setting clear expectations for execution, quality, and delivery.
Ensure platform reliability, scalability, and security through strong operational processes, observability, and incident management.
Kinaxis is a global leader in end-to-end supply chain management, enabling supply chain excellence for all industries. They have over 2000 employees around the world and are working towards solving some of the biggest challenges facing supply chains today.
Identify and respond to security incidents on a global scale.
Act as an incident commander to drive incidents through the entire response lifecycle.
Conduct threat hunting activities, anticipate future threats, and maintain forward-thinking strategies for tools/technology/processes that combat sophisticated threat actors.
Mozilla Corporation is a non-profit-backed technology company that has shaped the internet for the better over the last 25 years. With more than 225 million people around the world using their products each month, they’re shaping the next 25 years of technology and helping to reclaim an internet built for people, not companies.
Design, develop, and maintain solutions on the ServiceNow platform.
Take ownership of ServiceNow platform functionality, ensuring scalability and stability.
Implement and enhance incident management, risk, and operational resilience workflows.
Smart Working believes your job should feel right every day and welcomes you into a genuine community that values your growth and well-being. They break down geographic barriers and connect skilled professionals with outstanding global teams and products for full-time, long-term roles.
Daily operations of the MCC, including monitoring live services and managing incident response.
Responsible for customer requests and tickets within committed SLA response times.
Create, refine, and follow policies and procedures for incident management, escalation, and communication.
Rocket Science Group is a co-development game studio specializing in multiplayer, platform services, publishing technology, and live operations for PC, console, and mobile titles. They have teams in Europe and North America and work in partnership with the game industry’s top creators.
Spearhead the strategic development and execution of global payroll risk and incident management.
Lead critical programs to enhance our risk posture and drive resolution of high-impact incidents.
Partner across teams to embed risk-aware thinking into everyday operations.
Remote is solving modern organizations’ biggest challenge – navigating global employment compliantly with ease. They make it possible for businesses of all sizes to recruit, pay, and manage international teams. With their core values at heart and future-focused work culture, their team works tirelessly on ambitious problems, asynchronously, around the world.