https://github.com/facebookresearch/fairseq/tree/main/examples/roberta
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
XLNet: Generalized Autoregressive Pretraining for Language Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Wikipedia Bibliography: