Bibliography (7):

  1. GPT-3: Language Models are Few-Shot Learners

  2. Attention Is All You Need

  3. RoBERTa: A Robustly Optimized BERT Pretraining Approach

  4. DeBERTa: Decoding-enhanced BERT with Disentangled Attention

  5. Language Models are Unsupervised Multitask Learners

  6. https://github.com/microsoft/LoRA