Similar Jobs
See allSenior Site Reliability Engineer
Clarifai
US
AWS
GCP
Kubernetes
Intermediate Site Reliability Engineer, Tenant Scale
GitLab
Americas
Kubernetes
GCP
AWS
Engineering Manager (Infra) - AI Reliability
Canva
ANZ
AWS
GCP
Terraform
Principal Site Reliability Engineer (AI-first SRE)
Groupon
South America
GCP
Kubernetes
Terraform
Site Reliability Engineer
Cohere
Global
Kubernetes
GCP
Azure
Platform Reliability & Enablement:
- Support and evolve the reliability of platforms used by the AI Research team.
- Ensure production services meet expectations for availability, latency, and operational readiness.
- Design infrastructure and operational patterns that prioritize iteration speed.
Embedded Collaboration:
- Act as an advisor on infrastructure, reliability, and operational concerns.
- Participate directly in team planning and execution, from early exploration through production rollout.
- Help researchers self-serve infrastructure safely and effectively.
Cloud Infrastructure & Operations:
- Build and maintain Kubernetes-based services on GCP using infrastructure-as-code and GitOps.
- Design and operate observability systems using tools such as Datadog.
- Participate in an on-call rotation, responding to incidents and helping improve systems.
Algolia
Algolia is a pioneer and market leader in AI Search, empowering 17,000+ businesses to deliver blazing-fast, predictive search and browse experiences. They have raised $150 million in Series D funding, quadrupling their valuation to $2.25 billion, investing in their market-leading platform.