“GPT-3 Nonfiction § Bender & Koller2020, Gwern2020-06-19 (, )⁠:

Nonfiction writing by OpenAI’s GPT-3 model, testing logic, commonsense reasoning, anagrams, PDF/OCR cleaning, creative nonfiction, etc

“Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data”, Bender & Koller2020 (awarded ACL 2020’s “Best theme paper”) criticizes neural language models, claiming that their philosophical arguments prove that such models will never truly understand anything as they lack communicative intent and other things intrinsically necessary for genuine understanding of language & concepts.

They offer two predictions, as it were, pre-registered before GPT-3, about test cases they claim NLMs will never understand: a vignette about a bear chasing a hiker (Appendix A), and the arithmetic word problem “Three plus five equals” rather than using digits/numbers (Appendix B), commenting:

It is clear that GPT-2 has learned what activity words tend to co-occur with bears and sticks (strap them to your chest, place the sticks, kill the bear, take your gun), but none of these completions would be helpful to A. We think this is because GPT-2 does not know the meaning of the prompt and the generated sentences, and thus cannot ground them in reality.

…To get a sense of how existing LMs might do at such a task, we let GPT-2 complete the simple arithmetic problem Three plus five equals. The five responses below, created in the same way as above, show that this problem is beyond the current capability of GPT-2, and, we would argue, any pure LM.

As with the stapler question, not only are “pure LMs” capable of solving both tasks in principle, they already solve the challenges, as shown below with GPT-3.