Bibliography (5):
Attention Is All You Need
Recurrent Neural Network Based Language Model ยง Dynamic Evaluation
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
https://mattmahoney.net/dc/textdata.html
Pointer Sentinel Mixture Models