Own performance optimization and reliability of large-scale GPU clusters and InfiniBand networking for HPC workloads.
Diagnose and resolve complex system-level issues across GPU, network, and compute layers, integrating new hardware components.
Develop automation for monitoring, fault detection, and proactive remediation in distributed compute environments.
Our partner is building a next-generation AI cloud infrastructure environment, focusing on large-scale high-performance computing systems. They foster a highly technical engineering culture with experts across systems, networking, and virtualization, offering career development and continuous learning opportunities.
Define and own the architectural roadmap for enterprise, data center, and cloud networks.
Co-author secure design and lifecycle management of high-performance data center networking.
Build an AI-agent based analysis framework for network changes and self-improvement.
Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs, delivering industry-leading training and inference speeds. The company partners with top model labs, global enterprises, and AI-native startups, including OpenAI, and has a non-corporate work culture that values continuous learning and growth.