Bibliography (12):

  1. The Pile: An 800GB Dataset of Diverse Text for Language Modeling

  2. GPT-3: Language Models are Few-Shot Learners

  3. CPM: A Large-scale Generative Chinese Pre-trained Language Model

  4. UNITER: UNiversal Image-TExt Representation Learning

  5. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  6. RoBERTa: A Robustly Optimized BERT Pretraining Approach

  7. T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

  8. GPT Understands, Too

  9. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

  10. Controllable Generation from Pre-trained Language Models via Inverse Prompting

  11. https://fastmoe.ai/

  12. Wikipedia Bibliography:

    1. Akaike information criterion