Source Job

Europe

  • Own end-to-end cluster architecture for large-scale NVIDIA GPU deployments.
  • Design high-performance network fabrics across compute, storage, and WAN.
  • Engage directly with OEMs and vendors, validating hardware configurations.

HPC GPU InfiniBand RDMA Linux

8 jobs similar to HPC Cluster Architect

Jobs ranked by similarity.

Global

  • Design and evolve multi-provider, multi-region GPU compute clusters optimized for large-scale training.
  • Serve as the primary technical point of contact for customers running large-scale training workloads.
  • Build production-grade automation for cluster provisioning, GPU health checks, job scheduling, self-healing, and firmware/driver lifecycle management.

Andromeda Cluster gives early-stage startups access to scaled AI infrastructure. They work with leading AI labs, data centers, and cloud providers to deliver compute when and where it’s needed most and are expanding to find the brightest in AI infrastructure, research and engineering.

Europe

  • Key end customer contact for planning solution design and creating technical concepts.
  • Senior technical expert responsible for design and implementation of an Edge computing related technologies.
  • Design, develop, test, implement and support Edge computing components and applications.

Deutsche Telekom IT Solutions Slovakia entered the life of Košice region in 2006 and has grown to be a major employer in the eastern part of the country. They aim to improve and transform into a company providing innovative information and communication technology services with over 3900 employees.

Europe

  • Operate and maintain large-scale Linux environments (bare metal, clusters, cloud)
  • Help scale clusters toward hundreds to thousands of nodes
  • Automate operational tasks using tools like Python, Bash, Ansible, or Terraform

Mistral AI builds high-performance, open, and efficient AI systems designed to power the next generation of applications. They are a collaborative, low-ego, and highly technical team, operating across Europe, the US, and beyond.

US

  • Lead the design and implementation of high-performance data movement pipelines using NVIDIA NIXL across GPU, CPU, and storage tiers.
  • Architect and drive integration of DDN Infinia with GPU-accelerated inference platforms for large-scale, real-time AI workloads.
  • Own end-to-end optimization of I/O paths between GPU memory and storage using technologies such as NVIDIA GPUDirect Storage, RDMA, and NVMe-over-Fabrics.

DataDirect Networks (DDN) has been at the forefront of AI and high-performance data storage innovation for over two decades. DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence and is the global leader in AI and multi-cloud data management at scale.

$250,000–$290,000/yr
US

  • Own end-to-end fiber and network infrastructure deployment across colo data center sites.
  • Design fiber pathways, structured cabling systems, and high-density fiber distribution architectures.
  • Oversee installation of fiber (SMF/MMF), patch panels, trays, and cable management systems.

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. We celebrate different backgrounds, perspectives, and skills.

Australia

  • Serve as the primary technical partner between customers and Armada’s Product and Engineering teams, translating real‑world requirements into actionable designs.
  • Provide hands‑on technical guidance on AI Factory solutions, including modular and liquid‑cooled data centers and NVIDIA‑based GPU systems.
  • Advise customers on workload suitability, rack‑level design, system architecture, and deployment tradeoffs.

Armada is a full-stack edge infrastructure company delivering compute, connectivity, and sovereign AI/ML to some of the world’s most remote places. They're backed by top investors such as Microsoft (M12), Founders Fund, and has strategic partnerships including Starlink, Skydio, and NVIDIA.

Australia 5w PTO

  • Own the design, deployment and operation of OpenStack and Kubernetes environments.
  • Build and improve infrastructure using infrastructure-as-code and GitOps practices.
  • Optimise GPU workload scheduling using Kubernetes and NVIDIA tooling.

NexGen Cloud is building next-generation GPU cloud infrastructure, and is the company behind Hyperstack, a high-performance cloud platform designed for compute-intensive workloads. We're a scale-up by design, solving complex infrastructure challenges at pace, with real-world impact.

US Unlimited PTO

  • Design end-to-end network architectures incorporating sub-sea, dark fiber, IP transit, and WAN/LAN.
  • Act as the lead technical advisor on complex deals, delivering tailored, high-confidence network solutions.
  • Ensure technical validation and consistency across all network deployments and advisory engagements.

Inflect is revolutionizing how companies buy and sell digital infrastructure (Datacenter, cloud, and network services). They are removing friction so any company can get what they need, when they need it, with the right deal terms to unlock businesses to build and human innovation.