BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
RoBERTa: A Robustly Optimized BERT Pretraining Approach
OPT: Open Pre-trained Transformer Language Models
GPT-3: Language Models are Few-Shot Learners
https://github.com/Yale-LILY/FOLIO