Job Description
The Cloud Site Reliability Engineer (SRE) is responsible for ensuring the reliability, scalability, and performance of production-grade services deployed across multiple cloud vendors and infrastructure platforms for Smile Digital Health, its clients, and partners. This role designs and automates performance testing frameworks, integrates them into CI/CD pipelines, and uses observability tools to proactively detect and resolve bottlenecks. Working closely with engineering, product, and security teams, the SRE ensures systems meet strict SLAs for performance and availability while driving continuous optimization across multiple cloud platforms.
Responsibilities include collaborating with Security Operations, developing multi-tenant service offerings, designing performance testing strategies, maintaining cost tracking for Cloud Service Providers, creating documentation, and maintaining relationships with core Cloud Service Providers. The role also involves implementing a secure infrastructure platform, ensuring SLA's are met, automating deployment, participating in on-call rotations, and maintaining internal tools. Ongoing compliance with organizational policies and accurate time tracking are required.
About Smile Digital Health
Smile Digital Health makes it easy for healthcare stakeholders to collect and exchange data with our leading FHIR-based data liberation platform.