“Conceptual Captions: A Cleaned, Hypernymed, Image Alt-Text Dataset For Automatic Image Captioning”, Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut2018-07-01 (, , )⁠:

We present a new dataset of image caption annotations, Conceptual Captions, which contains an order of magnitude more images [3.3m training] than the MS-COCO dataset (Lin et al 2014) and represents a wider variety of both images and image caption styles. We achieve this by extracting and filtering image caption annotations from billions of webpages.

We also present quantitative evaluations of a number of image captioning models and show that a model architecture based on Inception-ResNetv2 (Szegedy et al 2016) for image-feature extraction and Transformer (Vaswani et al 2017) for sequence modeling achieves the best performance when trained on the Conceptual Captions dataset.