“Investigating the Existence of ‘Secret Language’ in Language Models”, Yimu Wang, Peng Shi, Hongyang Zhang2023-07-24 (, , , )⁠:

In this paper, we study the problem of secret language in NLP, where current language models (LMs) seem to have a hidden vocabulary that allows them to interpret absurd inputs as meaningful concepts. We investigate two research questions: “Does the secret language phenomenon exist in different language models?” and “Does secret language depend on specific context?”

To answer these questions, we introduce a novel method named SecretFinding, a gradient-based approach that can automatically discover secret languages in LMs. We conduct experiments on 5 representative models (Electra, ALBERT, Roberta, DistilBERT, and CLIP) finetuned on 4 NLP benchmarks (SST-2, MRPC, SNLI, and SQuAD) and a language-grounding benchmark (MS COCO).

Our experimental results show that even when we replace the most important words with others that are semantically dissimilar to the original words in a sentence, LMs do not consider the new sentence semantically dissimilar to the original, as the output does not change with a high probability. This phenomenon holds true across the 5 models and 5 tasks and gives a positive answer to the first research question.

As for the second research question, we find that the secret language discovered by SecretFinding is quite general and could even be transferred to other models in the black-box settings, such as GPT-3 and ChatGPT.

Finally, we discuss the causes of secret language, how to eliminate it, the potential connection to memorization, and ethical implications. Examples of secret language found by SecretFinding are available on https://huggingface.co/spaces/anonauthors/SecretLanguage.