Collaborate with research scientists during algorithm design to ensure code is efficiently designed from inception. Identify and communicate best practices for model development and reformulate performance bottlenecks. Develop a framework suitable for mixed precision training of multi-task, multi-modality models in a heterogeneous distributed training environment. Research, implement, and test alternative formulations for fundamental DNN operations and AV centric representations.
Enable safe and efficient deployment of PnP models. Qualifications include familiarity with PyTorch API and implementation, CUDA system design, asynchronous programming model, and heterogeneous compute using C++ / libtorch. Bonus points for experience with machine learning model architecture design, TensorRT, CUDA kernel implementation, and autonomous vehicle multi-sensor model architectures.