Bibliography (6):

DenseNet: Densely Connected Convolutional Networks
Trained on 100 million words and still in shape: BERT meets British National Corpus
The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus
https://x.com/a_stadt/status/1737849248560066794
LLaMA-2: Open Foundation and Fine-Tuned Chat Models
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding