Bibliography (3):
Attention Is All You Need
Language Models are Unsupervised Multitask Learners
Pointer Sentinel Mixture Models