“Co-Training Improves Prompt-Based Learning for Large Language Models”, Hunter Lang, Monica Agrawal, Yoon Kim, David Sontag2022-02-02 (; similar)⁠:

We demonstrate that co-training (Blum & Mitchell1998) can improve the performance of prompt-based learning by using unlabeled data. While prompting has emerged as a promising paradigm for few-shot and zero-shot learning, it is often brittle and requires much larger models compared to the standard supervised setup.

We find that co-training makes it possible to improve the original prompt model and at the same time learn a smaller, downstream task-specific model. In the case where we only have partial access to a prompt model (eg. output probabilities from GPT-3 (Brown et al 2020)) we learn a calibration model over the prompt outputs. When we have full access to the prompt model’s gradients but full finetuning remains prohibitively expensive (eg. T0 (Sanh et al 2021)), we learn a set of soft prompt continuous vectors to iteratively update the prompt model.

We find that models trained in this manner can improve performance on challenging datasets where there is currently a large gap between prompt-based learning and fully-supervised models.