“What Can a Generative Language Model Answer About a Passage?”, Douglas Summers-Stay, Claire Bonial, Clare Voss2021-11-10 ()⁠:

Generative language models trained on large, diverse corpora can answer questions about a passage by generating the most likely continuation of the passage followed by a question/answer pair. However, accuracy rates vary depending on the type of question asked.

In this paper we keep the passage fixed, and test with a wide variety of question types, exploring the strengths and weaknesses of the GPT-3 language model.

We provide the passage and test questions as a challenge set for other language models.

4.2.3 Reasoning: The most challenging questions we posed were questions requiring some kind of reasoning process to arrive at the answer. There has been some success at getting GPT to correctly follow a reasoning process by giving examples of the reasoning steps to follow, and having it imitate these steps one at a time. In the zero-shot prompts we are using, however, reasoning beyond what was required for the earlier types of questions seemed to be beyond its capabilities. It is unclear to what extent these difficulties with reasoning lie with the architecture (a limited number of layers can only carry out so many steps) or with the training set. Certainly other transformers trained on, for example, calculus problems (Lample & Charton2019) rather than web text, are able to correctly generate valid chains of reasoning.