Bibliography (8):

  1. T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

  2. mT5: A massively multilingual pre-trained text-to-text transformer

  3. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

  4. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

  5. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

  6. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

  7. Attention Is All You Need

  8. https://arxiv.org/pdf/2101.03961.pdf#page=18&org=google