Job Description

Enable enterprise customers to operationalize AI workloads by deploying and optimizing model-serving platforms. Package and deploy ML/LLM models on Triton, vLLM, or KServe within Kubernetes clusters and tune performance for latency and throughput SLAs. Integrate models with Rackspace’s Unified Inference API and API Gateway for multi-tenant routing, supporting RAG and agentic workflows by connecting to vector databases and context stores. Configure telemetry for GPU utilization, request tracing, and error monitoring, collaborating with FinOps to enable usage metering and chargeback reporting.

About Rackspace

We combine our expertise with the world’s leading technologies — across applications, data and security — to deliver end-to-end solutions.

Apply for This Position