Similar Jobs
See allSenior AI Infrastructure & Platform Operations Engineer
Mirantis
Europe
Linux
Kubernetes
Networking
AI Infrastructure & Platform Operations Engineer
Mirantis
Europe
Linux
Kubernetes
Networking
Platform Support Engineer (APAC)
Lightning AI
APAC
Kubernetes
PyTorch
CUDA
Principal GenAI Platform Engineer (US)
PointClickCare
US
Kubernetes
AI
CI/CD
SRE
Fal
US
Kubernetes
Terraform
Ansible
Key Responsibilities:
- Act as the primary NVIDIA AI Enterprise and vector database solutions expert for HyperPOD customer environments, bringing deep knowledge of NVAIE services and vector databases.
- Own complex end-to-end triage across GPU, NVAIE services, vector DB, Kubernetes, Docker, high-speed networking, and Infinia storage.
- Diagnose and resolve performance bottlenecks in RAG and agentic AI workflows, from model selection to vector search.
Required Experience & Skills:
- 5+ years in Linux-based infrastructure roles supporting production systems; strong hands-on with containers and Kubernetes.
- Demonstrated experience operating GPU-accelerated workloads in production, including NVIDIA GPUs, CUDA, and GPU Operator.
- Practical experience with AI storage and networking for HPC/AI clusters, including high-performance storage and RDMA-accelerated networking.
What Success Looks Like:
- Within 6–12 months, become the go-to internal expert for how the AI and networking stack works in production across Support, PS, Product, and NPI.
- Drive speed and quality of support at the solution level through high-quality diagnostics, architecture insight, and well-defined patterns.
- Establish clear, repeatable triage and escalation patterns for AI-side incidents that L1/L2 storage engineers can follow confidently.
DataDirect Networks
DataDirect Networks (DDN) is a global market leader in AI and high-performance data storage, powering many of the world's most demanding AI data centers across industries like life sciences, healthcare, financial services, and research. They are a global company with strong innovation, customer-centricity, and a team of passionate professionals committed to shaping the future of AI and data management.