We are looking for a seasoned Site Reliability Engineer (SRE) to join our distributed team. This is a fully remote, work-from-home opportunity. As a key member of our DevOps team, you will be responsible for designing, implementing, and maintaining mission-critical monitoring, alerting, and incident response systems. Your work will ensure high availability, reliability, and performance of our infrastructure, supporting scalable services in production environments. You will partner closely with engineering teams throughout the full development lifecycle, contributing to planning, design, deployment, and reliability goals.
The tech stack includes AWS, Azure, Grafana Cloud, Kubernetes, ArgoCD, Elixir, NodeJS, Python, TypeScript, React, Terraform, CloudFormation, Ansible, and GitHub Actions. The role offers 3 weeks of paid vacation, generous medical, dental, and vision plans, and a competitive salary.