āLanguage Model Inversionā, 2023-11-22 ()ā :
Language models produce a distribution over the next token; can we use this information to recover the prompt tokens? We consider the problem of language model inversion and show that next-token probabilities contain a surprising amount of information about the preceding text. Often we can recover the text in cases where it is hidden from the user, motivating a method for recovering unknown prompts given only the modelās current distribution output.
We consider a variety of model access scenarios, and show how even without predictions for every token in the vocabulary we can recover the probability vector through search. On LLaMA-2 7b, our inversion method reconstructs prompts with a BLEU of 59 and token-level F1ā78 and recovers 27% of prompts exactly.
Code for reproducing all experiments is available at vec2text.
View PDF: