Bibliography (10):

  1. https://arxiv.org/abs/2010.11929

  2. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  3. GLUE Benchmark

  4. Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale

  5. Attention Is All You Need

  6. https://pile.eleuther.ai/