-
Exploring the Limits of Weakly Supervised Pretraining
-
Deep Residual Learning for Image Recognition
-
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
-
ImageNet Large Scale Visual Recognition Challenge
-
A Simple Framework for Contrastive Learning of Visual Representations
-
SimCLRv2: Big Self-Supervised Models are Strong Semi-Supervised Learners
-
SEER: Self-supervised Pretraining of Visual Features in the Wild
-
BEiT: BERT Pre-Training of Image Transformers
-
https://arxiv.org/pdf/2201.08371#page=13&org=facebook
-
CLIP: Connecting Text and Images: We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the ‘zero-shot’ capabilities of GPT-2 and GPT-3
-
Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale
-
ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
-
Do ImageNet Classifiers Generalize to ImageNet?
-
The Devil is in the Tails: Fine-grained Classification in the Wild
-