Bibliography (8):

  1. Attention Is All You Need

  2. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  3. Language Models are Unsupervised Multitask Learners

  4. RoBERTa: A Robustly Optimized BERT Pretraining Approach

  5. Cross-lingual Language Model Pretraining

  6. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

  7. XLNet: Generalized Autoregressive Pretraining for Language Understanding

  8. CTRL: A Conditional Transformer Language Model For Controllable Generation