Similar Jobs

See all

Accountabilities:

  • Own the performance optimization and reliability of large-scale GPU clusters and InfiniBand networking environments supporting HPC workloads.
  • Tune and optimize GPU cluster performance and InfiniBand fabric efficiency to ensure high throughput and low-latency computing.
  • Diagnose, troubleshoot, and resolve complex system-level issues across GPU, network, and compute layers.

Requirements:

  • 5+ years of experience in system-level software engineering with a focus on performance, scalability, or infrastructure optimization.
  • 3+ years of hands-on experience with Linux systems administration, debugging, and performance tuning.
  • Strong understanding of server and hardware architecture including PCIe, NICs, GPUs, and Linux kernel-level behavior.

Benefits:

  • Competitive compensation package.
  • Career development and continuous learning opportunities in advanced AI and HPC systems.
  • Flexible working arrangements and remote-friendly culture across Europe.

Jobgether

Our partner is building a next-generation AI cloud infrastructure environment, focusing on large-scale high-performance computing systems. They foster a highly technical engineering culture with experts across systems, networking, and virtualization, offering career development and continuous learning opportunities.

Apply for This Position