Own the performance, stability, and uptime of our global Azure infrastructure.
Lead and develop a high-performing operations team.
Drive automation and reduce manual operational overhead.
CluedIn is reshaping data management with an Azure-native, graph-based Modern Master Data Management (MDM) platform. They are trusted by global industry leaders and backed by five-star reviews on Gartner Peer Insights.
Continuously monitor infrastructure, cloud platforms, identity systems, networking, and security tooling using centralized monitoring and alerting solutions.
Mercer Advisors helps families amplify and simplify their financial lives by integrating financial planning, investment management, business management, tax, estate, insurance, and more, managed by a single team. They serve over 31,300 families across 90+ cities in the U.S. and are ranked the #1 RIA Firm in the nation by Barron’s for two consecutive years.
Serve as the top-tier technical expert to resolve complex issues across endpoints, identity, networks, and core business applications, documenting root causes and scalable solutions.
Maintain strong security and compliance across all IT workflows, applying data security principles (e.g., MFA, least-privilege access) and ensuring all records (tickets, assets, changes) are audit-ready.
Mentor junior team members, providing guidance and stepping in to handle frontline support during high-volume periods.
Equip is a virtual eating disorder treatment program that aims to ensure everyone with an eating disorder can access effective treatment. Founded in 2019, Equip has a highly-engaged, passionate, and diverse culture, operating in all 50 states and partnering with most major health insurance plans.
Lead and grow high-performing platform engineering teams that deliver reliable, scalable infrastructure and operational excellence for Vanta’s products and customers.
Set technical direction and drive multi-quarter platform initiatives spanning infrastructure reliability, security, scalability, and developer experience across shared systems and services.
Partner closely with product engineering, security, and engineering leadership to identify organizational needs and deliver scalable platform solutions.
Vanta helps businesses earn and prove trust by empowering companies to practice better security and prove it with ease. They have a kind and talented team, and while some have prior security experience, many have been successful without it.
Support the Platform Infrastructure by managing container environments on EKS, implementing GitOps workflows, and maintaining CI/CD pipelines.
Build for Reliability by defining SLIs/SLOs, leading incident response, and contributing to disaster recovery planning.
Drive Observability by designing and maintaining monitoring and logging stacks with Datadog, Sentry, and CloudWatch.
Turquoise Health is a Series C price transparency platform for finance leaders across healthcare, building the infrastructure for a more open, efficient healthcare marketplace. The company is a remote-first, US-based team of over 300 enterprise organizations that values transparency, empathy, inclusivity, creativity, and ownership.
Manage platform operations, analyze support requests, and prioritize technical resources.
Lead client operational status reviews, build relationships, and ensure client satisfaction with Managed Services.
Assist with commercial documentation, track project run rates, and identify relationship expansion opportunities.
TTEC Digital pioneers engagement and growth solutions to fuel exceptional customer experience (CX). It has over 1,800 employees. TTEC has been awarded the Great Place To Work 2024-2025 certification based on outstanding employee experience across 14 countries.
Own and evolve Launch Potato's cloud infrastructure, CI/CD platform, and compliance posture.
Build the SRE function from the ground up so product teams can ship faster without compromising reliability, security, or cost control.
Stand up the SRE practice from scratch: on-call rotation, PagerDuty configuration, SLA/SLO definitions for core infrastructure services, runbook library, and observability dashboards that tie site performance to business metrics.
Launch Potato is a digital media company that connects consumers with leading brands through data-driven content and technology. They are headquartered in South Florida with a remote-first team spanning over 15 countries, with a high-growth, high-performance culture.
Build and improve scalable infrastructure operations processes that support a growing cloud platform.
Enable customer-facing and operational teams with secure automation, diagnostics, tooling and clear workflows.
Reduce repeatable manual work by identifying operational pain points and turning them into automated or self-service solutions.
NexGen Cloud delivers on-demand and private GPU infrastructure to a wide array of customers. They're a tight-knit, fast-moving team working at the cutting edge of AI cloud infrastructure, equipping their people with AI at every level.
Own production health, reliability, and operational support processes across critical systems and services
Lead incident response efforts, stakeholder communication, root cause analysis, and post-incident reviews
Design and implement AI-driven agents and workflows that automate support and operational tasks
Quanata is on a mission to help ensure a better world through context-based insurance solutions. They are an exceptional, customer centered team with a passion for creating innovative technologies, digital products, and brands. Quanata, LLC is wholly owned and funded by State Farm.
Defining and driving the vision and strategy for Infrastructure Observability.
Identifying gaps in end to end experience, defining and owning the roadmap to fill those gaps.
Working closely across teams and across Orgs, collaborating with Engineering, UX, Design and other teams to deliver on your roadmap.
Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale — unleashing the potential of businesses and people. The Elastic Search AI Platform, used by more than 50% of the Fortune 500, brings together the precision of search and the intelligence of AI to enable everyone to accelerate the results that matter.
Lead the design, execution, and continuous improvement of operational processes across the organization.
Manage and mentor operations staff, serving as a key liaison between sales and programs teams.
Oversee tool implementation and operational infrastructure projects, proactively identifying inefficiencies.
Vocal Media is focused on developing leaders, policies, and a team culture that embraces diversity and equity to deliver high-quality services to clients and creators. They operate with a commitment to fostering a diverse staff and a healthy organizational environment.
Lead software engineering teams providing infrastructure-as-code to manage cloud infrastructure.
Hire experienced site reliability staff, and a line manager to grow and oversee the SRE team.
Establish design-before-build discipline; facilitate lightweight design documents, architectural decision records, and working group reviews.
Horizon3.ai is a cybersecurity company dedicated to enabling organizations to proactively find, fix, and verify exploitable attack vectors. They are a fast-growing company with a culture of respect, collaboration, ownership, and results.
Own the technical direction of Remote's SRE/Platform domain.
Define and drive the reliability strategy across the platform.
Identify and lead AI enablement initiatives across the engineering organisation.
Remote is solving modern organizations’ biggest challenge – navigating global employment compliantly with ease. With our core values at heart and a future-focused work culture, our team works tirelessly on ambitious problems, asynchronously, around the world.
Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, and their team thrives in an innovation-driven environment.
Lead security incidents end-to-end, from detection and triage through containment and post-incident review, acting as incident commander.
Conduct hands-on investigations across cloud and endpoint environments to determine root cause and impact, and partner with Observability & Automation to improve detections and build automated playbooks.
Collaborate with Security, Infrastructure, and Product teams to identify gaps, strengthen the incident response lifecycle, and communicate effectively with both technical and non-technical stakeholders.
Affirm is reinventing credit to create honest and friendly financial products like buy now, pay later services without hidden fees. As a remote-first fintech company, they cultivate a collaborative and team-first culture for their skilled professionals.
Oversee a specialized SRE team focused on the design, deployment, and maintenance of automation toolsets.
Establish and enforce standards for IaC to ensure consistent, repeatable, and secure deployments.
Drive the automated lifecycle of both physical and virtual assets, from initial template creation/deployment to automated patching, scaling, and decommissioning.
Galaxy is a global leader in digital assets and data center infrastructure, delivering solutions that accelerate progress in finance and artificial intelligence. Led by CEO and Founder Michael Novogratz, their team blends deep crypto expertise with institutional experience and a shared commitment to shaping the future of Web3 and AI.
Serve as the primary technical point of contact and client liaison.
Perform comprehensive discovery of client environments during onboarding.
Develop strategic technology recommendations that advance IT maturity.
DYOPATH is dedicated to building trusted relationships. They translate complex technology into business value and guide organizations toward smarter IT decisions. Offers medical, dental & vision coverage, life insurance and 401(k) with company match.
Build and maintain end-to-end observability with ELK, Prometheus, and Grafana.
Own and improve CI/CD pipelines (CircleCI, GitLab CI, GitHub Actions, ArgoCD).
Lead incident response and postmortems in a blameless culture.
Redcare Pharmacy is Europe’s No.1 e-pharmacy, powered by passionate teams and cutting-edge innovation. They strive to create a healthy, collaborative work environment where every employee feels valued and inspired to contribute to their vision “Until every human has their health”.
Design, build, and maintain scalable, reliable systems on GCP.
Develop automation for infrastructure provisioning using Terraform, Ansible, or Deployment Manager.
Manage incident response, conduct postmortems, and implement improvements to reduce recurrence.
SupplyHouse.com is an industry-leading e-commerce company specializing in HVAC, plumbing, heating, and electrical supplies since 2004. They value every individual team member and cultivate a community where people come first with Generosity, Respect, Innovation, Teamwork, and GRIT.