https://huggingface.co/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k
An Open Source Implementation of CLIP
ImageNet: A Large-Scale Hierarchical Image Database
Microsoft COCO: Common Objects in Context
Stable Diffusion Public Release
Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale
ImageNet Large Scale Visual Recognition Challenge
https://arxiv.org/abs/2212.00794
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Reproducible scaling laws for contrastive language-image learning
https://laion.ai/blog/large-openclip/
Training Deep Nets with Sublinear Memory Cost