Built a token-wise likelihood visualizer for GPT-2 over the weekend. There are some interesting patterns and behaviors you can easily pick up from a visualization like this, like induction heads and which kinds of words/grammar LMs like to guess.
Jan 24, 2023 · 4:52 AM UTC
A particularly interesting example to run through this is source code, where because of the regular structure the LM does much better (much lower perplexity). Indentations and punctuation are particularly easy wins for GPT-2.