Similar Jobs
See allSite Reliability Engineer
Cohere
Global
Kubernetes
GCP
Azure
AI Engineering Leader
Abacus Insights
US
Python
TensorFlow
PyTorch
Engineering Manager, AI Systems
Cribl
US
Typescript
Javascript
Python
Principal Python Engineer - ML Infrastructure
Alignerr
Python
Senior Software Engineer, AI
Turquoise Health
US
Python
AI/ML
Backend
Responsibilities:
- Own the reliability, performance, and operational health of production AI systems, focusing on improving complex, existing services.
- Lead efforts to refactor and harden the AI codebase to improve observability, maintainability, and resilience.
- Diagnose and resolve issues across distributed systems.
Requirements:
- Proven experience designing, building, and operating distributed systems in production.
- Strong understanding of service architecture, concurrency, resource management, and distributed failure modes.
- Hands-on experience running production services on Kubernetes.
Bonus Points:
- Experience collaborating with ML or data science teams to productionize predictive systems.
- Ability to improve system architecture and engineering practices over time through design, code review, and mentorship.
MixMode
MixMode is a leading provider of AI-powered cybersecurity solutions at scale, pioneering a patented third-wave, context-aware AI approach. Large organizations with big data workloads trust MixMode to defend their most important assets.