Bibliography (8):

  1. Attention Is All You Need

  2. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

  3. Language Models are Unsupervised Multitask Learners