-
GPT-3: Language Models are Few-Shot Learners
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
-
Language Models are Unsupervised Multitask Learners
-
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
-
https://github.com/VITA-Group/DSEE