Similar Jobs
See allAI Infrastructure & Platform Operations Engineer
Mirantis
Europe
Linux
Kubernetes
Networking
SRE
Fal
US
Kubernetes
Terraform
Ansible
Platform Support Engineer (APAC)
Lightning AI
APAC
Kubernetes
PyTorch
CUDA
Technical Support Engineer
Mirantis
Linux
Kubernetes
Networking
Staff Site Reliability Engineer I EMEA
Remote
Global
Kubernetes
AWS
Terraform
Technical Operations & Service Reliability:
- Lead the investigation and resolution of complex infrastructure, networking, and platform-related incidents.
- Support large-scale NVIDIA GPU infrastructure and high-performance networking environments.
- Analyze platform performance, capacity, stability, and reliability trends to proactively identify risks.
Platform Operations & Engineering:
- Provide technical leadership for Kubernetes platform operations and supporting infrastructure services.
- Drive improvements in platform reliability, observability, monitoring, and operational processes.
- Support the adoption and operation of AI-powered infrastructure services through k0rdent AI.
Technical Leadership:
- Mentor and support AI Infrastructure & Platform Operations Engineers.
- Develop and maintain operational standards, runbooks, troubleshooting guides, and best practices.
- Act as a trusted technical advisor during operational planning and service improvement initiatives.
Mirantis
Mirantis helps organizations ship code faster on public and private clouds, providing a public cloud experience on any infrastructure from the data center to the edge. The company serves many of the world's leading enterprises, including Adobe, DocuSign, Liberty Mutual, and PayPal, and is a leader in container management.