Similar Jobs
See allSenior DevOps & Platform Engineer
About Us
AWS
Docker
Kubernetes
Staff Software Engineer
Rula
US
SRE
DevOps
Kubernetes
Senior Site Reliability Engineer
Pismo
Global
AWS
Azure
Kubernetes
Sr Site Reliability Engineer
Pismo
South America
Kubernetes
AWS
Terraform
Staff Site Reliability Engineer
SmarterDx
US
Terraform
Kubernetes
AWS
Platform Reliability & SLAs:
- Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement against them
- Design, instrument, and maintain observability systems (metrics, logs, traces) across multi-region AWS infrastructure
- Identify reliability gaps, lead blameless post-mortems, and close the loop with permanent fixes
On-Call & Incident Response:
- Participate in an on-call rotation and act as incident commander for high-severity production events
- Build and maintain runbooks, escalation paths, and incident playbooks that keep mean time to resolution low
- Drive improvements to alerting fidelity; reduce noise, increase signal, eliminate toil
What We're Looking For:
- 5+ years of SRE, platform engineering, or production operations experience in a SaaS environment
- Deep hands-on Kubernetes expertise; you understand the scheduler, networking, storage, and autoscaling at a level where you can debug anything
- Strong AWS fundamentals across compute (EC2, EKS), networking (VPC, NLB, Route53), storage (S3, RDS), and IAM
Akuity
Akuity helps enterprises ship software faster and more reliably with modern GitOps best practices. The Akuity Platform enables teams to manage the development and deployment across hundreds – if not thousands – of Kubernetes clusters from a single control plane.