Using "natural language" to describe rationales, pioneered by (Ling et al 2017) and then followed by (Cobbe et 2021, Wei et al 2022), is essential for the success of chain of thought prompting. 1/
A simple example is the addition task (Nye et al, 2021), where “scratchpad” finetuning shows awesome results. Does it work with prompting? Turns out, its prompting accuracy is nearly 0 (tested on GPT-3 002, two-digit numbers, with many shots). 2/
Now let’s write a natural language rationale to “naturally” describe the logic of addition. See fig below. Such a chain-of-thought prompt achieves an accuracy of 99% with 1-shot! 3/
Such a chain-of-thought prompt designed for 2-digit number additions even generalizes to 3-digit additions with an accuracy of 94% with 3-shot! 4/

Jul 14, 2022 · 7:22 PM UTC

Natural language based rationales are also critical for the success of self-consistency (Wang et al 2022). That is because natural language can greatly diversify the generated solutions which lead to the same correct answer. 5/