Bibliography (3):

  1. Attention Is All You Need

  2. Language Models are Unsupervised Multitask Learners

  3. Pointer Sentinel Mixture Models