Model Performance Engineer 27 days ago Benchmark FP8 quantization across GPU families and ship a production config to achieve speedup.Evaluate serving frameworks with speculative decoding to improve performance.Build a fine-tuning pipeline to enable faster model training and deployment. Python LLM CUDA View details Similar jobs