Bibliography (6):

  1. https://nn.labml.ai/transformers/retro/model.html

  2. GPT-3: Language Models are Few-Shot Learners

  3. https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf#ai21

  4. The Pile: An 800GB Dataset of Diverse Text for Language Modeling

  5. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  6. Wikipedia Bibliography:

    1. Differentiable function