Bibliography (15):

  1. Attention Is All You Need

  2. ImageNet Large Scale Visual Recognition Challenge

  3. https://arxiv.org/pdf/2305.09828#page=11

  4. https://arxiv.org/pdf/2305.09828#page=12

  5. Scaling MLPs: A Tale of Inductive Bias

  6. Vision Transformer: An Image is Worth 16Γ—16 Words: Transformers for Image Recognition at Scale

  7. ImageNet: A Large-Scale Hierarchical Image Database

  8. https://arxiv.org/pdf/2305.09828#page=3

  9. Language Models are Unsupervised Multitask Learners

  10. Do Vision Transformers See Like Convolutional Neural Networks?

  11. What do Vision Transformers Learn? A Visual Exploration

  12. Training data-efficient image transformers & distillation through attention

  13. Vision Transformers Need Registers

  14. ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases

  15. Wikipedia Bibliography:

    1. CIFAR-10