“FLAN: Finetuned Language Models Are Zero-Shot Learners”, Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le2021-09-03 (, , , ; similar)⁠:

[blog] This paper explores a simple method for improving the zero-shot learning abilities of language models. We show that instruction tuning—finetuning language models on a collection of tasks described via instructions—substantially boosts zero-shot performance on unseen tasks.

We take a 137b parameter pretrained language model and instruction-tune it on over 60 NLP tasks verbalized via natural language instruction templates.

We evaluate this instruction-tuned model, which we call FLAN, on unseen task types. FLAN substantially improves the performance of its unmodified counterpart and surpasses zero-shot 175B GPT-3 on 20⁄25 tasks that we evaluate. FLAN even outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze.

Ablation studies reveal that number of tasks and model scale are key components to the success of instruction tuning.