Job Description
Build and maintain data pipelines for LLM (Large Language Model) training and evaluation, curate user-understanding signals (such as intents, preferences, and behavioral features), and ensure data quality, privacy, and proper dataset management. Develop and manage labeling and feedback loops, including heuristics, annotation jobs, and prompt-based labeling, to create high-quality corpora, collaborating with Data Engineering and Applied Science partners to improve data coverage and reduce noise. Design, prototype, and ship to production agentic AI solutions, including multi-agent systems using frameworks like LangGraph, and implement context-aware features in partnership with senior engineers. Implement an evaluation framework to measure model quality on offline test sets (accuracy, bias, safety, user-intent coverage), and build dashboards to track improvements over time. Lead and contribute to experimentation by implementing metrics, A/B tests, and monitoring, helping to harden prototypes for reliable rollouts. Collaborate with senior engineers and cross-functional partners to select the right technologies, participate in code reviews, and share best practices (including mentoring interns or new hires as needed). Summarize research findings and model evaluations into clear write-ups and demos for the team and cross-functional stakeholders. Stay current on emerging agentic AI paradigms, implement paper-inspired proofs of concept, and contribute insights to the team roadmap.
About Zillow
Zillow is reimagining real estate to make home a reality for more and more people by helping movers find and win their home through digital solutions.