Job Description
Mattermost is seeking a highly skilled Site Reliability Engineer (SRE) to help design, operate, and improve the infrastructure powering our secure, mission-critical collaboration platform. As part of our globally distributed Engineering team, you will focus on reliability, scalability, performance, and automation across cloud and hybrid environments.
You will play a key role in ensuring our systems are observable, resilient, and efficient, working closely with development, security, and operations teams to deliver exceptional uptime and performance to our customers in defense, government, and critical infrastructure sectors.
Responsibilities Include:
Build, maintain, and optimize containerized workloads for production environments
Implement infrastructure-as-code for repeatable and reliable deployments
Implement and maintain compliant cloud environments to meet regulatory and security requirements for customers in highly regulated domains (e.g., FedRAMP, DoD).
Establish and maintain observability solutions for monitoring, alerting, and performance tuning
Perform incident response for production systems, including root cause analysis and remediation
Drive automation to reduce manual operations and improve system reliability
Collaborate across teams to design scalable, secure, and compliant architectures
Participate in an on-call rotation for production systems
About Mattermost
Mattermost builds the #1 collaborative workflow solution for defense, intelligence, security, and critical infrastructure organizations.