Bibliography (3):

ImageNet Large Scale Visual Recognition Challenge
Compressive Transformers for Long-Range Sequence Modeling
Attention Is All You Need