Bibliography (5):

  1. GPT-3: Language Models are Few-Shot Learners

  2. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  3. Language Models are Unsupervised Multitask Learners

  4. DeBERTa: Decoding-enhanced BERT with Disentangled Attention

  5. https://github.com/VITA-Group/DSEE