âSTaR: Bootstrapping Reasoning With Reasoningâ, 2022-03-28 (; backlinks; similar)â :
Generating step-by-step âchain-of-thoughtâ rationales improves language model performance on complex reasoning tasks like mathematics or commonsense question-answering. However, inducing language model rationale generation currently requires either constructing massive rationale datasets or sacrificing accuracy by using only few-shot inference.
We propose a technique to iteratively leverage a small number of rationale examples and a large dataset without rationales, to bootstrap the ability to perform successively more complex reasoning. This technique, the Self-Taught Reasoner (STaR), relies on a simple loop [self-distillation]: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine-tune on all the rationales that ultimately yielded correct answers; repeat.
We show that STaR substantially improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers, and performs comparably to fine-tuning a 30Ă larger state-of-the-art language model on CommensenseQA.
Thus, STaR lets a model improve itself by learning from its own generated reasoning.