Beyond neural scaling laws: beating power law scaling via data pruning
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Clothing-1M: Learning from Massive Noisy Labeled Data for Image Classification
Deep Residual Learning for Image Recognition
https://arxiv.org/abs/1801.04381#google
DenseNet: Densely Connected Convolutional Networks
Going Deeper with Convolutions