Similar Jobs
See allSystems Engineer, HPC (APAC)
Mistral AI
APAC
Linux
Python
Bash
HPC Engineer
Jobgether
US
Linux
AWS
Terraform
AI Infrastructure & Platform Operations Engineer
Mirantis
Europe
Linux
Kubernetes
Networking
Senior AI Infrastructure & Platform Operations Engineer
Mirantis
Europe
Linux
Kubernetes
Networking
Senior Software Engineer, Machine Learning Inference Platform
Stack AV
US
C++
Python
PyTorch
Accountabilities:
- Own the performance optimization and reliability of large-scale GPU clusters and InfiniBand networking environments supporting HPC workloads.
- Tune and optimize GPU cluster performance and InfiniBand fabric efficiency to ensure high throughput and low-latency computing.
- Diagnose, troubleshoot, and resolve complex system-level issues across GPU, network, and compute layers.
Requirements:
- 5+ years of experience in system-level software engineering with a focus on performance, scalability, or infrastructure optimization.
- 3+ years of hands-on experience with Linux systems administration, debugging, and performance tuning.
- Strong understanding of server and hardware architecture including PCIe, NICs, GPUs, and Linux kernel-level behavior.
Benefits:
- Competitive compensation package.
- Career development and continuous learning opportunities in advanced AI and HPC systems.
- Flexible working arrangements and remote-friendly culture across Europe.
Jobgether
Our partner is building a next-generation AI cloud infrastructure environment, focusing on large-scale high-performance computing systems. They foster a highly technical engineering culture with experts across systems, networking, and virtualization, offering career development and continuous learning opportunities.