Job Description

Own the text-to-SQL accuracy problem end-to-end: design evals, iterate prompts, and improve retrieval/routing. Build and operate the experimentation and evaluation loop (automatic evals, regression suites, dataset curation). Design pragmatic LLM application architectures (RAG, agent routing, tool-use orchestration) optimized for accuracy and latency. Ship production-grade code and support deployments; instrument, monitor, and troubleshoot model behavior in real customer environments. Partner closely with engineering and customers to improve semantic models, SQL generation, and data alignment. Create feedback loops from users to systematically capture issues and convert them into measurable improvements. Contribute to automation of environment provisioning and dev workflows to enable fast iteration.

About Bobsled

Bobsled is building AI-powered analytics experiences that turn natural language into accurate, production-grade insights.

Apply for This Position