-
Attention Is All You Need
-
ImageNet Large Scale Visual Recognition Challenge
-
https://arxiv.org/pdf/2305.09828#page=11
-
https://arxiv.org/pdf/2305.09828#page=12
-
Scaling MLPs: A Tale of Inductive Bias
-
Vision Transformer: An Image is Worth 16Γ16 Words: Transformers for Image Recognition at Scale
-
ImageNet: A Large-Scale Hierarchical Image Database
-
https://arxiv.org/pdf/2305.09828#page=3
-
Language Models are Unsupervised Multitask Learners
-
Do Vision Transformers See Like Convolutional Neural Networks?
-
What do Vision Transformers Learn? A Visual Exploration
-
Training data-efficient image transformers & distillation through attention
-
Vision Transformers Need Registers
-
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
-