Role Responsibilities:
- Build and maintain CI/CD pipelines, infrastructure automation, and observability tools to support multi-agent systems.
- Collaborate with development, operations, and infrastructure teams to streamline processes and troubleshoot issues.
- Strengthen security and compliance posture for SOC 2 and HIPAA using advanced cloud and DevOps practices.
Agentic AI Integration:
- Design and operate auto-remediation agents for production toil with human-in-the-loop controls.
- Use LLMs for incident triage and root cause analysis, integrating AI agents through the Model Context Protocol.
- Establish operational guardrails and best practices for AI coding assistants in infrastructure repositories.
About You:
- 5+ years of experience in DevOps, SRE, or Platform Engineering operating production systems on AWS.
- Strong experience with CI/CD pipelines, production EKS environments, AWS networking, and Terraform.
- Deep experience with Aurora/RDS MySQL, Redis, S3, Prometheus, Grafana, and OpenTelemetry for observability.
Zingtree
Zingtree is a next-generation intelligent process automation platform reimagining customer experience operations for enterprise support leaders. It is a small team with high ownership, emphasizing automation, collaboration, and transparency.