As a Machine Learning Engineer, you will contribute directly to our machine learning infrastructure, to the ScalarLM open source codebase, and build large-scale language model applications on top of it. Youโll operate at the intersection of high-performance computing, distributed systems, and cutting-edge machine learning research, developing the fundamental infrastructure that enables researchers and organizations worldwide to train and deploy large language models at scale. You will contribute code and performance improvements to the open source project, develop and optimize distributed training algorithms for large language models and implement high-performance inference engines and optimization techniques. You will work on integration between vLLM, Megatron-LM, and HuggingFace ecosystems, build tools for seamless model training, fine-tuning, and deployment and optimize performance of advanced GPU architectures. You will also collaborate with the open source community on feature development and bug fixes. Research and implement new techniques for self-improving AI agents.