Job Description
As the Lead Site Reliability Engineer, you will define the strategy, architecture, and roadmap for Mattermostβs site reliability engineering function, aligning infrastructure initiatives with product and business goals. You will lead the design, deployment, and optimization of production-grade containerized workloads, infrastructure-as-code, and compliant cloud environments for regulated domains (e.g., FedRAMP, DoD).
You will establish and evolve observability, monitoring, and alerting frameworks to ensure performance, reliability, and capacity planning at scale. You will drive incident management processes, including on-call rotations, root cause analysis, and systemic reliability improvements. You will partner with security and compliance teams to meet data sovereignty, security, and regulatory requirements.
About Mattermost
At Mattermost, we build the #1 collaborative workflow solution for defense, intelligence, security, and critical infrastructure organizations.