ā€œZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalizationā€, Hanwei Xu, Yujun Chen, Yulun Du, Nan Shao, Yanggang Wang, Haiyu Li, Zhilin Yang2022-01-18 (, ; backlinks; similar)⁠:

We propose a multitask pretraining approach ZeroPrompt for zero-shot generalization, focusing on task scaling and zero-shot prompting.

While previous models are trained on only a few dozen tasks, we scale to 1,000 tasks for the first time using real-world data. This leads to a crucial discovery that task scaling can be an efficient alternative to model scaling; ie. the model size has little impact on performance with an extremely large number of tasks. [maybe just ceiling effect?]

Figure 1: Task scaling vs. model scaling. With an extremely large number of training tasks, the model size has little impact on performance. Moreover, task scaling consistently improves performance at various model scales. For the reference baselines, RoBERTa-Large was finetuned in a fully-supervised manner, while Pangu Alpha and CPM-2 were zero-shot prompted. All models were trained and evaluated in Chinese.

Our results show that task scaling can substantially improve training efficiency by 30Ɨ in FLOPs. Moreover, we present a prompting method that incorporates a genetic algorithm to automatically search for the best prompt for unseen tasks, along with a few other improvements.

Empirically, ZeroPrompt substantially improves both the efficiency and the performance of zero-shot learning across a variety of academic and production datasets.