-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
-
Language Models are Unsupervised Multitask Learners
-
iGPT: Generative Pretraining from Pixels
-
ImageNet Large Scale Visual Recognition Challenge
-
CIFAR-10 and CIFAR-100 Datasets
-
STL-10 Dataset
-
The Bitter Lesson
-