[retrospective] Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch.
We propose Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a language model.
Our method outperforms the state-of-the-art on six text classification tasks, reducing the error by 18–24% on the majority of datasets. Furthermore, with only 100 labeled examples, it matches the performance of training from scratch on 100× more data.
Figure 3: Validation error rates for supervised and semi-supervised ULMFiT vs. training from scratch with different numbers of training examples on IMDb, TREC-6, and AG (from left to right).
…Impact of pretraining: We compare using no pretraining with pretraining on WikiText-103 (Merityet al2017b) in Table 4: Pretraining is most useful for small and medium-sized datasets, which are most common in commercial applications. However, even for large datasets, pretraining improves performance.
Pretraining
IMDb
TREC-6
AG
Without pretraining
5.63
10.67
5.52
With pretraining
5.00
5.69
5.38
Table 4: Validation error rates for ULMFiT with and without pretraining.