ā€œDo Llamas Work in English? On the Latent Language of Multilingual Transformersā€, Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West2024-02-16 ()⁠:

We ask whether multilingual language models trained on unbalanced, English-dominated corpora use English as an internal pivot language—a question of key importance for understanding how language models function and the origins of linguistic bias. Focusing on the LLaMA-2 family of transformer models, our study uses carefully constructed non-English prompts with a unique correct single-token continuation.

From layer to layer, transformers gradually map an input embedding of the final prompt token to an output embedding from which next-token probabilities are computed. Tracking intermediate embeddings through their high-dimensional space reveals 3 distinct phases, whereby intermediate embeddings (1) start far away from output token embeddings; (2) already allow for decoding a semantically correct next token in the middle layers, but give higher probability to its version in English than in the input language; (3) finally move into an input-language-specific region of the embedding space.

We cast these results into a conceptual model where the 3 phases operate in ā€œinput spaceā€, ā€œconcept spaceā€, and ā€œoutput spaceā€, respectively. Crucially, our evidence suggests that the abstract ā€œconcept spaceā€ lies closer to English than to other languages, which may have important consequences regarding the biases held by multilingual language models.