Job Description

Operate and scale Vultr’s control plane, ensuring availability, correctness, and performance across global datacenters. Design, implement, and maintain automation to manage hypervisor fleets (KVM, QEMU, libvirt) and supporting infrastructure at scale. Develop tooling and automation for Open vSwitch (OVS), BGP routing, and other networking components to ensure resilient and self-healing network operations. Continuously analyze and improve system performance across compute, storage, and network layers. Implement advanced monitoring, logging, and tracing solutions while leading incident response to minimize impact and drive postmortem culture. Maintain and evolve infrastructure pipelines to enable safe, fast, and reliable changes. Coach and mentor team members in best practices for site reliability.

About Vultr

Vultr is on a mission to make high-performance cloud infrastructure easy to use, affordable, and locally accessible for enterprises and AI innovators around the world.

Apply for This Position