Bibliography (3):

  1. https://research.google/blog/alternating-updates-for-efficient-transformers/

  2. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

  3. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems