“CQK Is The First Unused TLA”, Gwern2023-09-29 (, , , ; backlinks)⁠:

Curious what the first ‘unused’ alphabetic acronym is, I have GPT-4 write a script to check English Wikipedia. After three bugs, the first unused one turns out as of 2023-09-29 to be the three-letter acronym ‘CQK’, with another 2.6k TLA unused, and 393k four-letter acronyms unused. Exploratory analysis suggests alphabetical order effects as well as letter-frequency.

It sometimes seems as if everything that could be trademarked has been, and as if every possible three-letter acronym (TLA) has been used in some nontrivial way by someone. Is this true? No—actually, a fair number, starting with CQK, have no nontrivial use to date.

We could check by defining ‘nontrivial’ as ‘has an English Wikipedia article, disambiguation page, or redirect’, and then writing a script which simply looks up every possible TLA Wikipedia URL to see which ones exist. This is a little too easy, so I make it harder by making GPT-4 write a Bash shell script to do so (then Python to double-check).

GPT-4 does so semi-successfully, making self-reparable errors until it runs into its idiosyncratic ‘blind spot’ error. After it accidentally fixes that, the script appears to work successfully, revealing that—contrary to my expectation that every TLA exists—the first non-existent acronym is the TLA ‘CQK’, and that there are many unused TLAs (2,684 or 15% unused) and even more unused four-letter acronyms (392,884 or 85% unused). I provide the list of all unused TLAs & four-letter acronyms (as well as alphanumerical ones—the first unused alphanumerical one is AA0.)

TLAs are not unused at random, with clear patterns enriched in letters like ‘J’ or ‘Z’ vs ‘A’ or ‘E’. Additional GPT-4-powered analysis in R suggests that both letter-frequency & position in alphabet predict unusedness to some degree, but leave much unexplained