Job Description
This person will be instrumental in architecting, implementing, and maintaining foundational Machine Learning Training infrastructure that powers Feeds Ranking, Content Understanding, Recommendations and much more to fulfill Redditβs mission of bringing community and belonging to everyone in the world. They will oversee GPU optimization in AI/ML batch workloads and debug performance bottlenecks in GPU workloads. They will build and own systems and tools that enable MLEs and data scientists, and continuously improve the ML software development lifecycle.
Optimize model training on GPUs Lead the building, testing, and maintenance of ML infrastructure at Reddit Propose, design, and implement high-performance ML platform solutions that significantly advance the deployment of models that serve millions of redditors a seamless experience for MLEs. Analyze bottlenecks in distributed systems and optimize for performance and cost-efficiency.
About Reddit
Reddit is a community of communities built on shared interests, passion, and trust and is home to the most open and authentic conversations on the internet.