Using "natural language" to describe rationales, pioneered by (Ling et al 2017) and then followed by (Cobbe et 2021, Wei et al 2022), is essential for the success of chain of thought prompting.
1/
A simple example is the addition task (Nye et al, 2021), where “scratchpad” finetuning shows awesome results. Does it work with prompting? Turns out, its prompting accuracy is nearly 0 (tested on GPT-3 002, two-digit numbers, with many shots).
2/
Such a chain-of-thought prompt designed for 2-digit number additions even generalizes to 3-digit additions with an accuracy of 94% with 3-shot!
4/
Jul 14, 2022 · 7:22 PM UTC