Are you an experienced Site Reliability Engineer who thrives at the intersection of software engineering and production operations?
Do you take pride in keeping mission-critical customer systems reliable under real-world operational pressure?
Are you looking for an opportunity to own production reliability for a modern hybrid infrastructure platform spanning cloud, colocation, and edge environments?

Primary Responsibilities:

Own production reliability for Climavision’s customer-facing platform and radar-derived weather data services across Azure, colocation, and edge Kubernetes environments.
Drive multi-replica and multi-cluster high availability across Climavision’s .NET services by refactoring C# code for safe horizontal scaling.
Support and coordinate production incident response, including troubleshooting, mitigation, communication, and postmortem analysis.

On-Call Expectation:

Participate in a primary on-call rotation, taking one full week of duty at a time with 24/7 availability including nights, weekends, and holidays.
Acknowledge pages within response-time SLO, drive incidents to resolution, and maintain reliable connectivity.
Plan personal time around published rotation and arrange documented coverage swaps when unavoidable.

Climavision

Climavision rebuilds climate technology with a proprietary high-resolution weather radar and satellite network, reducing coverage gaps and improving forecasting. Backed by The Rise Fund, they are a growing company headquartered in Louisville, KY with R&D in Raleigh, NC.

Apply for This Position