Similar Jobs
See allSenior Site Reliability Engineer
UJET
US
AWS
GCP
Azure
Site Reliability Engineer
Arista Networks
Europe
Go
Python
Linux
Senior Software Engineer | Cloud
ExtraHop
US
Go
Python
Kubernetes
Senior DevOps / Infrastructure Engineer
Bloomreach
Europe
Python
Golang
Kubernetes
Site Reliability Engineer
Mistral AI
Europe
Docker
Kubernetes
Terraform
Role Expectations:
- Design and implement highly available, scalable infrastructure across AWS, Azure, GCP, and bare-metal environments
- Drive an "automation-first" culture by writing code (Python/Go) to eliminate manual toil and build self-healing systems
- Implement and maintain sophisticated observability (Prometheus, Grafana, OpenTelemetry), define SLIs/SLOs, and establish error budgets
Minimum Qualifications:
- 8+ years of experience managing reliability, scalability, and availability for large-scale production services
- Deep expertise in programming (e.g., Python, Go, or C/C++)
- Strong background in networking protocols, Linux/FreeBSD systems, and distributed architecture
Preferred Qualifications:
- Extensive experience with public cloud (AWS, Azure, GCP) and Infrastructure-as-Code (Ansible, Terraform)
- Experience with chaos engineering and disaster recovery planning at scale
- Expertise in global routing (BGP) and traffic tunneling (GRE, IPSec) with a deep understanding of L7 proxy architectures (HAProxy), DNS at scale, and OS networking stack internals
Zscaler
Zscaler accelerates digital transformation to ensure our customers can be more agile, efficient, resilient, and secure. They are an AI-forward enterprise, constantly pushing the envelope, leveraging the world’s largest security data lake.