Bibliography (7):

  1. https://github.com/facebookresearch/fairseq/tree/main/examples/roberta

  2. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  3. Chinchilla: Training Compute-Optimal Large Language Models

  4. XLNet: Generalized Autoregressive Pretraining for Language Understanding

  5. https://arxiv.org/pdf/1907.11692.pdf#page=12&org=facebook

  6. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding