“ULMFiT: Universal Language Model Fine-Tuning for Text Classification”, Jeremy Howard, Sebastian Ruder2018-01-18 (, ; similar)⁠:

[retrospective] Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch.

We propose Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a language model.

Our method outperforms the state-of-the-art on six text classification tasks, reducing the error by 18–24% on the majority of datasets. Furthermore, with only 100 labeled examples, it matches the performance of training from scratch on 100× more data.

We open-source our pretrained models & code.

Figure 3: Validation error rates for supervised and semi-supervised ULMFiT vs. training from scratch with different numbers of training examples on IMDb, TREC-6, and AG (from left to right).

Impact of pretraining: We compare using no pretraining with pretraining on WikiText-103 (Merity et al 2017b) in Table 4: Pretraining is most useful for small and medium-sized datasets, which are most common in commercial applications. However, even for large datasets, pretraining improves performance.

Pretraining IMDb TREC-6 AG
Without pretraining 5.63 10.67 5.52
With pretraining 5.00 5.69 5.38

Table 4: Validation error rates for ULMFiT with and without pretraining.