“Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process”, Tian Ye, Zicheng Xu, Yuanzhi Li, Zeyuan Allen-Zhu2024-07-29 (, , , ; similar)⁠:

Recent advances in language models have demonstrated their capability to solve mathematical reasoning problems, achieving near-perfect accuracy on grade-school level math benchmarks like GSM8K. In this paper, we formally study how language models solve these problems.

We design a series of controlled experiments to address several fundamental questions: (1) Can language models truly develop reasoning skills, or do they simply memorize templates? (2) What is the model’s hidden (mental) reasoning process? [non-myopic] (3) Do models solve math questions using skills similar to or different from humans? (4) Do models trained on GSM8K-like datasets develop reasoning skills beyond those necessary for solving GSM8K problems? [yes] (5) What mental process causes models to make reasoning mistakes? (6) How large or deep must a model be to effectively solve GSM8K-level math questions? [depth > width]

Our study uncovers many hidden mechanisms by which language models solve mathematical questions, providing insights that extend beyond current understandings of LLMs.