ā€œClimbing towards NLU: On Meaning, Form, and Understanding in the Age of Dataā€, Emily M. Bender, Alexander Koller2020-07 (, ; backlinks; similar)⁠:

The success of the large neural language models on many NLP tasks is exciting. However, we find that these successes sometimes lead to hype in which these models are being described as ā€œunderstandingā€ language or capturing ā€œmeaning.ā€ In this position paper, we argue that a system trained only on form has a priori no way to learn meaning. In keeping with the ACL 2020 theme of ā€œTaking Stock of Where We’ve Been and Where We’re Goingā€, we argue that a clear understanding of the distinction between form and meaning will help guide the field towards better science around natural language understanding.

…In this paper, we have argued that in contrast to some current hype, meaning cannot be learned from form alone. This means that even large language models such as BERT do not learn ā€œmeaningā€; they learn some reflection of meaning into the linguistic form which is very useful in applications. We have offered some thoughts on how to maintain a healthy, but not exaggerated, optimism with respect to research that builds upon these LMs. In particular, this paper can be seen as a call for precise language use when talking about the success of current models and for humility in dealing with natural language. With this we hope to encourage a top-down perspective on our field which we think will help us select the right hill to climb toward human-analogous NLU.

…Appendix B: GPT-2 and arithmetic

Tasks like DROP (Dua et al 2019) require interpretation of language into an external world; in the case of DROP, the world of arithmetic. To get a sense of how existing LMs might do at such a task, we let GPT-2 complete the simple arithmetic problem Three plus 5 equals.

The 5 responses below, created in the same way as above, show that this problem is beyond the current capability of GPT-2, and, we would argue, any pure LM.