âDynamic Evaluation of Transformer Language Modelsâ, 2019-04-17 (; backlinks)â :
This research note combines two methods that have recently improved the state-of-the-art in language modeling: Transformers and dynamic evaluation. Transformers use stacked layers of self-attention that allow them to capture long range dependencies in sequential data.
Dynamic evaluation fits models to the recent sequence history, allowing them to assign higher probabilities to re-occurring sequential patterns.
By applying dynamic evaluation to Transformer-XL models, we improve the state-of-the-art on
enwik80.99 â 0.94 bits/char, text8 1.08 â 1.04 bits/char, and WikiText-103 18.3 â 16.4 perplexity points.