Bibliography (6):

  1. Attention Is All You Need

  2. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  3. Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale

  4. ImageNet Large Scale Visual Recognition Challenge

  5. https://github.com/microsoft/unilm/tree/master/beit

  6. Wikipedia Bibliography:

    1. Image segmentation