LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
RedCaps: web-curated image-text data created by the people, for the people
CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs
Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning
WuDaoMM: A large-scale Multi-Modal Dataset for Pre-training models
WebVision Database: Visual Learning and Understanding from Web Data
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
WebVision Challenge: Visual Learning and Understanding With Web Data
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision