Job Description
As a Staff ML Infrastructure Engineer, you will lead development of a platform for large scale ML models at Reddit. Design end-to-end model lifecycle patterns (MLOps) to boost velocity of development for ML engineers, including data preparation, model management, experiment tracking, and more. Zero-to-one development and support of a graph ML codebase and platform that abstracts away common patterns and enables greater model scalability and iteration. Collaborate with ML engineers on performance tuning, including improving model training time, efficiency, and GPU training costs in a large, distributed ML training environment.
Optimize batch data processing within a data warehouse and with tools such as Apache Beam, Apache Spark, Ray Data, and more. Architect pipelines to build and maintain massive graph data structures on the order of billions of nodes and tens of billions of edges.
About Reddit
Reddit is a community of communities built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet.