“GPT-1: Improving Language Understanding With Unsupervised Learning”, 2018-06-11 (; backlinks; similar):
We’ve obtained state-of-the-art results on a suite of diverse language tasks with a scalable, task-agnostic system, which we’re also releasing.
Our approach is a combination of two existing ideas: transformers and unsupervised pre-training.
These results provide a convincing example that pairing supervised learning methods with unsupervised pre-training works very well; this is an idea that many have explored in the past, and we hope our result motivates further research into applying this idea on larger and more diverse datasets.