Bibliography (4):

  1. Language Models are Unsupervised Multitask Learners

  2. Attention Is All You Need

  3. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  4. Wikipedia Bibliography:

    1. Power law