“LAION-400-Million Open Dataset”, Christoph Schuhmann2021-08-20 (, )⁠:

We present LAION-400M: 400M English (image, text) pairs—see also our Data Centric AI NeurIPS Workshop2021 paper. The LAION-400M dataset is entirely openly, freely accessible.

The image-text-pairs have been extracted from the Common Crawl web data dump and are from random web pages crawled 201472021…You can find

LAION-400M Open Dataset structure: We produced the dataset in several formats to address the various use cases: