-
DenseNet: Densely Connected Convolutional Networks
-
Trained on 100 million words and still in shape: BERT meets British National Corpus
-
The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus
-
https://x.com/a_stadt/status/1737849248560066794
-
LLaMA-2: Open Foundation and Fine-Tuned Chat Models
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding