âDREAM: Uncovering Mental Models behind Language Modelsâ, 2021-12-16 (; similar)â :
To what extent do language models (LMs) build âmental modelsâ of a scene when answering situated questions (eg. questions about a specific ethical dilemma)? While cognitive science has shown that mental models play a fundamental role in human problem-solving, it is unclear whether the high question-answering performance of existing LMs is backed by similar model buildingâand if not, whether that can explain their well-known catastrophic failures.
We observed that Macaw, an existing âT5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformerâ, et al 2019-based LM, when probed provides somewhat useful but inadequate mental models for situational questions (estimated accuracy=43%, usefulness=21%, consistency=42%). We propose DREAM, a model that takes a situational question as input to produce a mental model elaborating the situation, without any additional task-specific training data for mental models. It inherits its social commonsense through distant supervision from existing NLP resources.
Our analysis shows that DREAM can produce better mental models (estimated accuracy=67%, usefulness=37%, consistency=71%) compared to Macaw. Finally, mental models generated by DREAM can be used as additional context for situational QA tasks. This additional context improves the answer accuracy of a Macaw zero-shot model by between +1% and +4% (absolute) on 3 different datasets.