Sr Linux System Administrator

Fal

Benefits

Similar Jobs

See all

Key Responsibilities:

  • Own the full lifecycle of our bare-metal GPU server fleet: provisioning, imaging, configuration management, patching, and decommissioning across multiple data centers and providers.
  • Build and maintain our server automation stack using Ansible, Terraform, and custom tooling to manage OS configuration, kernel parameters, driver versions, and firmware updates at scale.
  • Tune Linux systems for AI workloads: kernel parameters, NUMA topology, CPU pinning, hugepages, I/O schedulers, and GPU driver stack optimization (NVIDIA drivers, CUDA, container runtimes).

Requirements:

  • 8+ years of experience administering Linux systems at scale, ideally in GPU cloud, HPC, or large bare-metal environments.
  • Deep expertise in Linux internals: systemd, kernel tuning (sysctl, cgroups, namespaces), boot process, package management, and performance profiling (perf, bpftrace, sar).
  • Strong experience with configuration management and infrastructure-as-code: Ansible, Terraform, cloud-init, PXE/iPXE, and custom imaging pipelines.

Fal

Fal is a company focused on providing a GPU cloud platform. They offer visa sponsorship and relocation assistance to San Francisco, and have regular team events and offsites.

Apply for This Position