Job Description
We are seeking an experienced Site Reliability Engineer (SRE) responsible for ensuring our systems run smoothly and efficiently while engineering solutions to improve visibility, eliminate repetitive tasks, and increase system resilience. The ideal candidate will balance real-time on-call responsibilities with strategic engineering work to achieve sustainable and scalable service reliability.
As an SRE, you will Participate in on-call rotations as the primary technical lead for detecting, triaging, and resolving service degradation, outages, or reliability issues across all environments. Act as the Incident Commander during major incidents by initiating war room or bridge calls, coordinating cross-functional teams, providing timely and clear status updates to all stakeholders and leading/documenting blameless Root Cause Analyses (RCAs) to identify the root causes of issues and drive long-term fixes. Develop automation to eliminate manual and repetitive operational tasks (toil) related to reliability and operations across both applications and infrastructure to improve efficiency and system resilience. Create and maintain monitoring dashboards and alerts to monitor application and infrastructure health. Participate in feature development discussions to ensure services are built with observability from the ground up. Define and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs) in collaboration with Product and Engineering teams. Investigate and resolve customer complaints escalated beyond L1 and L2 support, especially those involving performance, reliability, or complex system behavior.
About Moniepoint
Moniepoint is an all-in-one financial services platform for emerging markets and the second-fastest growing company in Africa.