Similar Jobs
See allSenior Site Reliability Engineer - AI Infrastructure
Andromeda Cluster
Global
GPU
InfiniBand
Kubernetes
Lead Architect for Edge Computing
Deutsche Telekom System Solutions Slovakia
Europe
Linux
TCP/IP
Shell Scripting
Systems Engineer, HPC
Mistral AI
Europe
Linux
HPC
Python
Senior Staff Engineer
DataDirect Networks
US
Python
C++
RDMA
Network Fiber Engineer
Cerebras Systems
US
Troubleshooting
Communication
About the Role:
- Own the full architecture cycle, from customer conversation to deployment.
- Have direct ownership over cluster architecture across compute, networking, storage, and physical design.
- Translate customer requirements into production-ready GPU deployments.
What You'll Be Doing:
- Own end-to-end cluster architecture for large-scale NVIDIA GPU deployments.
- Design high-performance network fabrics across compute, storage, and WAN.
- Provide technical oversight during deployment and bring-up.
About You:
- Proven experience designing and delivering GPU-based HPC or AI clusters at scale.
- Deep hands-on knowledge of NVIDIA GPU platforms and NVIDIA reference architectures.
- Solid grounding in Linux systems, PCIe topology, NUMA alignment, and server-level performance.
NexGen Cloud
NexGen Cloud is the company behind Hyperstack, a full-stack AI cloud serving tens of thousands of customers from AI researchers to enterprises running the world's most compute-intensive workloads. We're a tight-knit, fast-moving team working at the cutting edge of AI cloud infrastructure.