“Clothing-1M: Learning from Massive Noisy Labeled Data for Image Classification”, Tong Xiao, Tian Xia, Yi Yang, Chang Huang, Xiaogang Wang2015-06-07 (, )⁠:

Large-scale supervised datasets are crucial to train convolutional neural networks (CNNs) for various computer vision problems. However, obtaining a massive amount of well-labeled data is usually very expensive and time consuming.

In this paper, we introduce a general framework to train CNNs with only a limited number of clean labels and millions of easily obtained noisy labels. We model the relationships between images, class labels and label noises with a probabilistic graphical model and further integrate it into an end to end deep learning system.

To demonstrate the effectiveness of our approach, we collect a large-scale real-world clothing classification dataset with both noisy and clean labels [Clothing-1M].

Experiments on this dataset indicate that our approach can better correct the noisy labels and improves the performance of trained CNNs.