“Clothing-1M: Learning from Massive Noisy Labeled Data for Image Classification”, 2015-06-07 ():
Large-scale supervised datasets are crucial to train convolutional neural networks (CNNs) for various computer vision problems. However, obtaining a massive amount of well-labeled data is usually very expensive and time consuming.
In this paper, we introduce a general framework to train CNNs with only a limited number of clean labels and millions of easily obtained noisy labels. We model the relationships between images, class labels and label noises with a probabilistic graphical model and further integrate it into an end to end deep learning system.
To demonstrate the effectiveness of our approach, we collect a large-scale real-world clothing classification dataset with both noisy and clean labels [Clothing-1M].
Experiments on this dataset indicate that our approach can better correct the noisy labels and improves the performance of trained CNNs.
See Also:
CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images
WebVision Database: Visual Learning and Understanding from Web Data
Active Learning for Convolutional Neural Networks: A Core-Set Approach
SimCLRv2: Big Self-Supervised Models are Strong Semi-Supervised Learners
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm