- Design and execute complex jailbreak attempts to identify vulnerabilities in state-of-the-art models.
- Use your background in linguistics or social sciences to find "hidden" biases or harms that standard automated filters miss.
- Systematically rank LLM outputs to determine where safety guardrails are failing or succeeding.