Bibliography (5):

GPT-3: Language Models are Few-Shot Learners
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Language Models are Unsupervised Multitask Learners
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
https://github.com/VITA-Group/DSEE