Job Description
As a Software Engineer on the Supercomputing Platforms and Infrastructure team, you will build the next generation systems that power large scale AI research and deployment. You will focus on sandboxed execution environments, distributed systems orchestration, and performance optimized compute workflows. You will work closely with ML and Research teams and infrastructure teams to deliver both high throughput, scale, and strong isolation guarantees in a cluster environment.
Build highly scalable, highly performant, software that facilitates arbitrary code execution with strong isolation guarantees. Design and build systems that allow our AI models to interface with machines in various modes, interactive terminal, GUI applications, etc. Provision and operate high density compute and storage nodes and build software that performs efficient load balancing, and resource utilization across them.
Instrument and optimize end to end performance. Troubleshoot complex infrastructure issues across OS, drivers, hardware, storage systems, networking, namespace isolation, and cloud or hybrid environments. Produce clean, documented code and developer workflows, and collaborate with SRE and security teams to ensure safe, reliable, and self serviceable compute offerings.
About Magic
Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems.