https://github.com/google-research/google-research/tree/master/fwl
Recurrent Neural Network Based Language Model ยง Dynamic Evaluation
Pointer Sentinel Mixture Models
Reconsidering the Past: Optimizing Hidden States in Language Models
GPT-3: Language Models are Few-Shot Learners