Bibliography (9):

  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  2. Language Models are Unsupervised Multitask Learners

  3. iGPT: Generative Pretraining from Pixels

  4. ImageNet Large Scale Visual Recognition Challenge

  5. CIFAR-10 and CIFAR-100 Datasets

  6. STL-10 Dataset

  7. The Bitter Lesson