As a Senior Cluster Site Reliability Engineer (SRE), you will help scale our research compute cluster to meet our growing needs, and you will leverage engineering skills to ensure high degrees of uptime, reliability, and robustness. Our research clusters are at the core of our R&D, and you will be directly responsible for keeping this key resource available and performant. Your work will provide a world-class HPC platform for researchers.
Remote Devops Jobs · SRE
12 results
FiltersJob listings
$205,000–$235,000/yr
US
4w PTO
Lead and grow an infrastructure team focused on SRE and DevX initiatives. Drive reliability, observability, and performance across Maze’s services and infrastructure. Design and evolve internal tooling, CI/CD pipelines, and cloud infrastructure that enable product teams to move fast and safely. Prepare infrastructure and tooling to support AI/ML workloads at scale.