USD/year
Review short, pre-segmented datasets, evaluate model-generated replies based on Tone or Fluency, and provide short rationales for extreme ratings, using a five-point scale to rate each after reading a user prompt and two model replies.