Linus · Jan 24, 2023 · 4:52 AM UTC

Linus · Jan 24, 2023 · 4:52 AM UTC

Linus

Linus @thesephist

24 Jan 2023

Built a token-wise likelihood visualizer for GPT-2 over the weekend. There are some interesting patterns and behaviors you can easily pick up from a visualization like this, like induction heads and which kinds of words/grammar LMs like to guess.

Jan 24, 2023 · 4:52 AM UTC

786

Linus · Jan 24, 2023 · 4:52 AM UTC

Linus @thesephist

24 Jan 2023

A particularly interesting example to run through this is source code, where because of the regular structure the LM does much better (much lower perplexity). Indentations and punctuation are particularly easy wins for GPT-2.

Linus · Jan 24, 2023 · 5:04 AM UTC

Linus @thesephist

24 Jan 2023

You can also use this viz to probe GPT-2 for what it thinks about different topics, which is kind of fun. You can imagine extensions of this "fill in the blank" UX become useful for writing workflows.

Simon Willison · Sep 5, 2023 · 1:25 AM UTC

Simon Willison @simonw

5 Sep 2023

Replying to @thesephist

This is so useful! Any thoughts on what it would it take to turn something like this into an interactive web page people could try out for themselves? I wonder if one of the LLMs-compiled-to-WebAssembly could handle this

Linus · Sep 5, 2023 · 3:54 PM UTC

Linus @thesephist

5 Sep 2023

You could definitely do this with transformers.js and a small model like gpt2-small since the model needn't be large to have the padagogical effect. I currently just have a demo that runs GPT2-xl on the server. One of the many things I haven't yet had time to make public 🫠

more replies

Chris J. Wallace · Jan 24, 2023 · 8:26 AM UTC

Chris J. Wallace @_cjwallace

24 Jan 2023

Replying to @thesephist

I’d love your thoughts on mapping a color space to probability. I once prototyped something similar and found the huge variance in likelihoods for different words made that a bit tricksy, but this looks really good.

Linus · Jan 24, 2023 · 3:36 PM UTC

Linus @thesephist

24 Jan 2023

My coloring algorithm is roughly: min, max = mean(log_probs) ± 2.5 * stddev(log_probs) hue = token_logprob.clamp(min, max).scale(0, 150) color = f"hsl({hue}deg 60% 85%)" Key is to scale probs to hue in the log space, and then clamp at µ±2.5stddev.

more replies

Alexander Cai · Apr 19, 2023 · 4:38 PM UTC

Alexander Cai @adzcai

19 Apr 2023

Replying to @thesephist

Have you heard of circuitsvis? It's a great open source library that also does this. Would also strongly recommend TransformerLens github.com/alan-cooney/Circu… and github.com/neelnanda-io/Tran…

GitHub - neelnanda-io/TransformerLens: A library for mechanistic interpretability of GPT-style...

A library for mechanistic interpretability of GPT-style language models - GitHub - neelnanda-io/TransformerLens: A library for mechanistic interpretability of GPT-style language models

github.com

Linus · Apr 19, 2023 · 4:43 PM UTC

Linus @thesephist

19 Apr 2023

Yes, TransformerLens and Neel are fantastic <3

Glavin Wiechert👨‍💻 · Jan 24, 2023 · 8:55 AM UTC

Glavin Wiechert👨‍💻

@GlavinW

24 Jan 2023

Replying to @thesephist

Open-source? I was thinking of building a similar UI, as I’m sure many have. Would love to contribute. Awesome work!

Justin Hedge · Jan 25, 2023 · 12:11 AM UTC

Justin Hedge

@justinhedge

25 Jan 2023

Replying to @thesephist

Interesting results.