Similar Jobs

See all

AI Infrastructure & Platform Operations Engineer

Mirantis

Europe

Linux Kubernetes Networking

Senior AI Infrastructure & Platform Operations Engineer

Mirantis

Europe

Linux Kubernetes Networking

Senior Software Engineer, Machine Learning Inference Platform

Stack AV

US

C++ Python PyTorch

Accountabilities:

Own the performance optimization and reliability of large-scale GPU clusters and InfiniBand networking environments supporting HPC workloads.
Tune and optimize GPU cluster performance and InfiniBand fabric efficiency to ensure high throughput and low-latency computing.
Diagnose, troubleshoot, and resolve complex system-level issues across GPU, network, and compute layers.

Requirements:

5+ years of experience in system-level software engineering with a focus on performance, scalability, or infrastructure optimization.
3+ years of hands-on experience with Linux systems administration, debugging, and performance tuning.
Strong understanding of server and hardware architecture including PCIe, NICs, GPUs, and Linux kernel-level behavior.

Benefits:

Competitive compensation package.
Career development and continuous learning opportunities in advanced AI and HPC systems.
Flexible working arrangements and remote-friendly culture across Europe.

Jobgether

Our partner is building a next-generation AI cloud infrastructure environment, focusing on large-scale high-performance computing systems. They foster a highly technical engineering culture with experts across systems, networking, and virtualization, offering career development and continuous learning opportunities.

Apply for This Position

Senior HPC Cluster Engineer

Similar Jobs

Systems Engineer, HPC (APAC)

HPC Engineer

AI Infrastructure & Platform Operations Engineer

Senior AI Infrastructure & Platform Operations Engineer

Senior Software Engineer, Machine Learning Inference Platform

Jobgether