Bibliography (7):

  1. Beyond neural scaling laws: beating power law scaling via data pruning

  2. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  3. Clothing-1M: Learning from Massive Noisy Labeled Data for Image Classification

  4. Deep Residual Learning for Image Recognition

  5. https://arxiv.org/abs/1801.04381#google

  6. DenseNet: Densely Connected Convolutional Networks

  7. Going Deeper with Convolutions