Responsibilities:
- Architect and optimize distributed training and inference systems for large-scale AI models, ensuring scalability and performance.
- Design and deliver customer-focused solutions, lead the transition of ML pipelines from POC to scalable production, and build long-term customer relationships.
- Create whitepapers, deliver technical presentations, host webinars, provide technical leadership, and mentor teams on AI infrastructure and deployment strategies.
Qualifications:
- 5+ years of experience with cloud technologies and infrastructure, ideally in senior MLOps or Solutions Architect roles, with proven expertise in scaling AI workloads.
- Deep knowledge of ML frameworks like PyTorch and JAX, and strong background in the NVIDIA HPC ecosystem (CUDA, NCCL, Infiniband).
- Exceptional communication skills to engage technical teams and business stakeholders, with legal authorization to work in the United States without sponsorship.
Compensation and Benefits:
- Competitive compensation ranging from $225,000 to $315,000 per year, with full medical benefits including 100% company-paid medical, dental, and vision coverage.
- 401(k) plan with a 4% match program, stock options plan, flexible remote work environment, and company-paid short-term, long-term disability, and life insurance.
- 20 weeks paid parental leave for primary caregivers and 12 weeks for secondary caregivers, with up to $85/month for mobile and internet.