Job Description
Contribute to a multilingual AI response evaluation project focused on reviewing large language model (LLM) outputs in different languages. You'll shape the future of AI in your native language, working from home with flexible hours. You will review short, pre-segmented datasets, evaluate model-generated replies based on tone or fluency, read a user prompt and two model replies, then rate each using a five-point scale, and provide short rationales for extreme ratings. Projects involve determining if replies are helpful, engaging, fair, appropriately formal, assess grammatical accuracy, clarity, coherence, and natural flow.
About CrowdGen
CrowdGen is focused on reviewing large language model outputs in different languages in Project Spearmint.