What you’ll be doing:
- Drive the design, delivery, and evolution of the GenAI Core platform, the shared AI infrastructure used by all product teams at Pleo
- Own and evolve service architecture, APIs, and integration contracts, with a high bar for reliability and observability
- Set technical direction for key parts of the platform, including LLM routing, budget enforcement, observability pipelines, and evaluation infrastructure
You’ll bring solid experience with:
- Designing and maintaining backend systems with strong requirements for reliability and observability
- Building and operating a shared platform or infrastructure components used by multiple teams
- Distributed systems fundamentals, including async workflows, idempotency, consistency, and designing for failure
Nice to have:
- Hands-on experience working with LLM APIs (OpenAI, Anthropic, AWS Bedrock, or similar) in a production system
- Familiarity with LLM evaluation patterns such as LLM-as-judge, vector similarity, and sampling strategies
- Experience building or operating proxy or routing layers for AI workloads