Skip to main content

‘LM tokenization’ directory

How to turn words into numbers is important for machine learning models to work well. Different kinds of tokenizations lead to models that ‘think’ in different ways, and can cause subtle & surprising errors (especially with BPEs).

See Also

Gwern

“GPT-3 Creative Fiction ”, Gwern 2020

GPT-3 Creative Fiction

“GPT-3 Nonfiction ”, Gwern 2020

GPT-3 Nonfiction

“GPT-2 Folk Music ”, Gwern & Presser 2019

GPT-2 Folk Music

Miscellaneous