Similar Jobs
See allAI Platform Engineer, Applied AI
Circle
Global
Python
Rails
RAG
AI Agent Architect, Customer Experience
Airtable
US
Prompt Engineering
APIs
Sr. Applied AI Engineer
AssetWatch
US
Python
SQL
LLM
Senior AI Engineer
League
Canada
Python
MLOps
GCP
AI Trainer - Oracle SQL Queries
CrowdGen
Global
SQL
Analytical
Agentic RAG Pipeline Evaluation & Optimization:
- Design evaluation datasets (synthetic query-answer pairs, adversarial cases, real user query sets).
- Measure retrieval quality using Recall@k, Precision@k, MRR, NDCG@k; assess appropriateness per use case.
- Evaluate and optimize chunking strategies; benchmark embedding models and re-rankers.
Broader AI/ML Evaluation:
- Conduct systematic error analysis through trace reading and failure mode identification.
- Design and validate LLM-as-Judge evaluators, refining iteratively and measuring TPR/TNR.
- Build and maintain golden datasets for CI regression testing of AI pipelines.
Collaboration & Data Review:
- Partner with Product to translate product requirements into measurable evaluation criteria.
- Partner with Engineering to instrument pipelines for observability and integrate evaluation checks into CI/CD.
- Lead or facilitate annotation workflows, measure inter-annotator agreement (Cohen's Kappa), and produce labeled datasets.
Jump
Jump empowers financial advisors, firms, and clients to thrive in the age of AI by automating tasks like meeting prep and compliance. As a Series A company, Jump has raised $30M and grown to 100+ employees including leaders from top companies and schools, fostering a culture of velocity, world-class standards, direct communication, and kindness.