Bibliography (3):
https://research.google/blog/alternating-updates-for-efficient-transformers/
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems