Bibliography (5):

  1. Attention Is All You Need

  2. Recurrent Neural Network Based Language Model ยง Dynamic Evaluation

  3. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

  4. https://mattmahoney.net/dc/textdata.html

  5. Pointer Sentinel Mixture Models