Bibliography (9):

‘end-to-end’ directory
Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale
Language Models are Unsupervised Multitask Learners
ImageNet Large Scale Visual Recognition Challenge
https://github.com/EleutherAI/openwebtext
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Wikipedia Bibliography:
1. Fourier transform
2. https://en.wikipedia.org/wiki/Block_matrix#Block_diagonal_matrices :
  
  https://en.wikipedia.org/wiki/Block_matrix#Block_diagonal_matrices