“On the State-Of-The-Art of Evaluation in Neural Language Models”, Gábor Melis, Chris Dyer, Phil Blunsom2017-07-18 (; similar)⁠:

Ongoing innovations in recurrent neural network architectures have provided a steady influx of apparently state-of-the-art results on language modeling benchmarks. However, these have been evaluated using differing code bases and limited computational resources, which represent uncontrolled sources of experimental variation.

We reevaluate several popular architectures and regularization methods with large-scale automatic black-box hyperparameter tuning and arrive at the somewhat surprising conclusion that standard LSTM architectures, when properly regularized, outperform more recent models. We establish a new state-of-the-art on the Penn Treebank and Wikitext-2 corpora, as well as strong baselines on the Hutter Prize dataset.