USD/year
You will challenge advanced language models on topics like verb conjugation, gender and number agreement, French idioms, phonetic nuances, sentence structure, and stylistic variation—documenting every failure mode so we can harden model reasoning. On a typical day, you will converse with the model on language scenarios, verify factual accuracy and logical soundness, capture reproducible error traces, and suggest improvements to our prompt engineering and evaluation metrics.