âSummon a Demon and Bind It: A Grounded Theory of LLM Red Teaming in the Wildâ, 2023-11-10 ()â :
Engaging in the deliberate generation of abnormal outputs from large language models (LLMs) by attacking them is a novel human activity.
This paper presents a thorough exposition of how and why people perform such attacks. Using a formal qualitative methodology, we interviewed dozens of practitioners from a broad range of backgrounds, all contributors to this novel work of attempting to cause LLMs to fail.
We relate and connect this activity between its practitionersâ motivations and goals; the strategies and techniques they deploy; and the crucial role the community plays.
As a result, this paper presents a grounded theory of how and why people attack large language models: LLM red teaming in the wild.