Bibliography (8):
Attention Is All You Need
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Language Models are Unsupervised Multitask Learners
Wikipedia Bibliography:
Transformer (deep learning architecture)
https://en.wikipedia.org/wiki/Neural_architecture_search :
https://en.wikipedia.org/wiki/Neural_architecture_search
Pareto front
Nvidia
https://en.wikipedia.org/wiki/OpenAI#GPT-2 :
https://en.wikipedia.org/wiki/OpenAI#GPT-2