Bibliography (4):
Language Models are Unsupervised Multitask Learners
Attention Is All You Need
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Wikipedia Bibliography:
Power law