“Memorization versus Generalization in Pre-Trained Language Models”, Michael Tänzer, Sebastian Ruder, Marek Rei2021-04-16 (, )⁠:

State-of-the-art pre-trained language models have been shown to memorize facts and perform well with limited amounts of training data.

To gain a better understanding of how these models learn, we study their generalization and memorization capabilities in noisy and low-resource scenarios.

We find that the training of these models is almost unaffected by label noise and that it is possible to reach near-optimal results even on extremely noisy datasets.

However, our experiments also show that they mainly learn from high-frequency patterns and largely fail when tested on low-resource tasks such as few-shot learning and rare entity recognition.

To mitigate such limitations, we propose an extension based on prototypical networks that improves performance in low-resource named entity recognition tasks.