Similar Jobs

See all

Key Responsibilities:

  • Act as the primary NVIDIA AI Enterprise and vector database solutions expert for HyperPOD customer environments, bringing deep knowledge of NVAIE services and vector databases.
  • Own complex end-to-end triage across GPU, NVAIE services, vector DB, Kubernetes, Docker, high-speed networking, and Infinia storage.
  • Diagnose and resolve performance bottlenecks in RAG and agentic AI workflows, from model selection to vector search.

Required Experience & Skills:

  • 5+ years in Linux-based infrastructure roles supporting production systems; strong hands-on with containers and Kubernetes.
  • Demonstrated experience operating GPU-accelerated workloads in production, including NVIDIA GPUs, CUDA, and GPU Operator.
  • Practical experience with AI storage and networking for HPC/AI clusters, including high-performance storage and RDMA-accelerated networking.

What Success Looks Like:

  • Within 6–12 months, become the go-to internal expert for how the AI and networking stack works in production across Support, PS, Product, and NPI.
  • Drive speed and quality of support at the solution level through high-quality diagnostics, architecture insight, and well-defined patterns.
  • Establish clear, repeatable triage and escalation patterns for AI-side incidents that L1/L2 storage engineers can follow confidently.

DataDirect Networks

DataDirect Networks (DDN) is a global market leader in AI and high-performance data storage, powering many of the world's most demanding AI data centers across industries like life sciences, healthcare, financial services, and research. They are a global company with strong innovation, customer-centricity, and a team of passionate professionals committed to shaping the future of AI and data management.

Apply for This Position