Bibliography (4):

  1. Mistral-7B

  2. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  3. Language Models are Unsupervised Multitask Learners

  4. https://github.com/kddubey/pretrain-on-test/