Job Description

As a Senior Site Reliability Engineer at Runwise, you will maintain the stability and performance of our services, ensuring they are reliable, scalable, and fault-tolerant. You’ll work closely with hardware and software engineers to build and maintain tools that improve the reliability and efficiency of our systems. Responsibilities will include, but are not limited to: * Design and maintain scalable infrastructure in AWS cloud and distributed on-prem systems * Automate infrastructure provisioning, deployment pipelines, and operational workflows using tools like Terraform, Ansible, or Helm * Build and improve monitoring, alerting, and observability systems (e.g., Cloud Health, Grafana) * Collaborate with development teams to improve service reliability, performance, and scalability * Participate in on-call rotation and manage incident response, including root cause analysis and postmortems * Define and track service-level objectives (SLOs) and service-level indicators (SLIs) * Conduct capacity planning, chaos testing, and disaster recovery exercises * Advocate for engineering best practices across CI/CD, security, and fault tolerance

About Runwise

Runwise is a customer-focused climate-tech startup that controls and runs the key energy systems in buildings throughout the US, reducing energy usage and carbon output.

Apply for This Position

Remote regions

Salary range

Benefits

Job Description

About Runwise