Bibliography (10):

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Language Models are Unsupervised Multitask Learners
iGPT: Generative Pretraining from Pixels
Image GPT (iGPT): We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples
ImageNet Large Scale Visual Recognition Challenge
CIFAR-10 and CIFAR-100 Datasets
STL-10 Dataset
The Bitter Lesson
Wikipedia Bibliography:
1. CIFAR-10
2. Volta (microarchitecture) § Products :
  
  https://en.wikipedia.org/wiki/Volta_(microarchitecture)#Products