Bibliography (6):

  1. DenseNet: Densely Connected Convolutional Networks

  2. Trained on 100 million words and still in shape: BERT meets British National Corpus

  3. The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

  4. https://x.com/a_stadt/status/1737849248560066794

  5. LLaMA-2: Open Foundation and Fine-Tuned Chat Models

  6. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding