Building and operating Kubernetes compute superclusters across multiple clouds. Partnering with cloud providers to optimize infrastructure costs, performance, and reliability for AI workloads. Working closely with research teams to understand their infrastructure needs and identify ways to improve stability, performance, and efficiency of novel model training techniques. Designing and building resilient, scalable systems for training AI models, focusing on creating intuitive user interfaces that empower researchers to self-serve to troubleshoot and resolve problems. Encouraging software best practices across our company and participating in team processes such as knowledge sharing, reviews, and on-call.