-
ImageNet Large Scale Visual Recognition Challenge
-
Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale
-
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
-
DeepViT: Towards Deeper Vision Transformer
-
Towards Learning Convolutions from Scratch
-
Finding the Needle in the Haystack with Convolutions: on the benefits of architectural bias
-
Homotopy Analysis for Tensor PCA