-
GPT-3: Language Models are Few-Shot Learners
-
Attention Is All You Need
-
RoBERTa: A Robustly Optimized BERT Pretraining Approach
-
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
-
Language Models are Unsupervised Multitask Learners
-
https://github.com/microsoft/LoRA
-