Lead Site Reliability Engineer

QCSS Health ⚕️🏥📊

Salary range

$120,000–$130,000/year

Benefits

Job Description

Design and implement site reliability engineering best practices across cloud infrastructure and application services hosted on AWS. Deploy and manage performance monitoring tools to track key application and infrastructure metrics. Define and manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) across platform services. Lead technical responses during critical incidents to restore service availability with minimal downtime. Perform root cause analysis and coordinate postmortems to ensure follow-up improvements are implemented. Identify reliability risks and propose infrastructure improvements to enhance system performance and fault tolerance. Collaborate with software engineering, DevOps, and database teams to embed reliability into the development lifecycle. Advise teams on system design for high availability, capacity planning, and disaster recovery.

About QCSS Health

QCSS Health offers solutions to support Medicaid health plans and providers in improving health care outcomes for vulnerable populations.

Apply for This Position