The Capacity Engineering team manages global capacity operations through operations, engineering, and automations; they are involved in the Development of automation for end to end cloud capacity lifecycle management. This role provides relief and sustainable resolution to issues within the infrastructure. Using experience in software development, systems engineering and networking to proactively prevent repeatable issues, driving initiatives with partner teams to improve the reliability and performance of the infrastructure through improved system design. Driving a culture of intolerance to manual activity which results in a highly automated environment delivering scalable solutions.
Remote Devops Jobs · C/C++
4 results
FiltersJob listings
You would be working in our pre-training team focused on building out our distributed training and inference of Large Language Models (LLMs). This is a hands-on role that focuses on software development best practices, maintenance, and code architecture. You will have access to thousands of GPUs to verify changes.
You would be working in our pre-training team focused on building out our distributed training and inference of Large Language Models (LLMs). This is a hands-on role that focuses on software reliability and fault tolerance. You will work on cross-platform checkpointing, NCCL recovery, and hardware fault detection. You will make high-level tools. You will have access to thousands of GPUs to test changes.
Help customers maximize the value they get out of OpenVM and the Axiom Proving API. Design compelling product demos, execute pilots, and work closely with teams across crypto and fintech to integrate ZK into their products with OpenVM. You will own technical customer relationships by customizing OpenVM for user needs and ensuring successful product impact. You will also help shape our product roadmap by deeply understanding and communicating customer needs.