Bibliography (10):

  1. https://www.youtube.com/watch?v=hgSGHusDx7M

  2. https://github.com/google/trax/blob/master/trax/examples/Terraformer_from_scratch.ipynb

  3. Attention Is All You Need

  4. T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

  5. BigBird: Transformers for Longer Sequences

  6. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

  7. SRU: Simple Recurrent Units for Highly Parallelizable Recurrence

  8. Neural GPUs Learn Algorithms

  9. GPT-3: Language Models are Few-Shot Learners

  10. Wikipedia Bibliography:

    1. ArXiv