Bibliography:

  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  2. XLNet: Generalized Autoregressive Pretraining for Language Understanding

  3. Language Models are Unsupervised Multitask Learners

  4. GROVER: Defending Against Neural Fake News

  5. $2019