USD/year
Design, build, and maintain large-scale distributed training infrastructure for Ads ML models. Develop tools and frameworks on top of the Ray platform. Build tools to debug, profile, and tune distributed training jobs for performance and reliability. Integrate with object storage systems and improve data access patterns.