Job Description
As part of the Network Software and Services for AI (nssAI) team at xAI, you'll build cutting-edge software, services, and frameworks to empower our Network Development Engineers. Working hands-on, you’ll tackle all facets of network management—metric collection, configuration, zero-touch provisioning, monitoring, and auto-remediation—driving automation-first solutions for xAI’s production and ancillary networks.
Expect to develop extensible tools, streamline complex processes, and ensure rock-solid reliability to support xAI’s mission of accelerating human scientific discovery through AI. The role is based in the offices of Palo Alto - California, Memphis - Tennessee or Remote. There will be travel expected to Palo Alto for inter team collaboration and the data center for hands-on experience using the software you write and identifying other opportunities of improvement. Focus will be on building software and tools with extensive metrics coverage for some of the world’s largest GPU supercomputing network fabrics used for AI training and serving customer inference queries and Implementing IaC best practices, enhancing deployment pipelines, and ensuring robust, secure service delivery across our production environments.
About Xai
xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.