Bibliography (5):

https://research.google/blog/scaling-vision-with-sparse-mixture-of-experts/
https://github.com/google-research/vmoe
Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
ImageNet Large Scale Visual Recognition Challenge