Drive the design of Nebius' next-generation AI infrastructure, making end-to-end architectural decisions across compute, networking, and storage. Architect scalable GPU cluster topologies including compute nodes, interconnect (InfiniBand, Ethernet), storage, and control planes. Analyze AI/ML workloads (e.g. LLM training, inference) to inform design tradeoffs across latency, bandwidth, and GPU density at POD and DC scale.
Partner with site reliability, networking, storage, and DC engineering teams to operationalize and scale architecture. Nebius offers competitive salaries, ranging from $150k- $180k base + quarterly performance bonuses.