Job Description
Pythian seeks a Site Reliability Engineer to operate and optimize Kubernetes clusters, Istio service mesh, and Linux-based systems. The role involves automating workflows using Go, Python, and Shell scripting. You build monitoring and observability solutions with Prometheus, Grafana, and Loki. Troubleshoot complex networking, storage, and system performance issues. The SRE partners with AI/ML teams to ensure infrastructure readiness for model training and data pipelines, participates in on-call rotations and postmortem reviews to improve system resilience.
About Pythian
Pythian is an expert in strategic database and analytics services, driving digital transformation and operational excellence.