“Dynamic Evaluation of Transformer Language Models”, Ben Krause, Emmanuel Kahembwe, Iain Murray, Steve Renals2019-04-17 (, , ; backlinks)⁠:

This research note combines two methods that have recently improved the state-of-the-art in language modeling: Transformers and dynamic evaluation. Transformers use stacked layers of self-attention that allow them to capture long range dependencies in sequential data.

Dynamic evaluation fits models to the recent sequence history, allowing them to assign higher probabilities to re-occurring sequential patterns.

By applying dynamic evaluation to Transformer-XL models, we improve the state-of-the-art on enwik8 0.99 → 0.94 bits/char, text8 1.08 → 1.04 bits/char, and WikiText-103 18.3 → 16.4 perplexity points.