Bibliography:

  1. ​ β€˜end-to-end’ tag

  2. ​ Attention Is All You Need

  3. ​ fully-connected#mlp-mixer

    [Transclude the forward-link's context]

  4. ​ MLP-Mixer: An all-MLP Architecture for Vision

  5. ​ DETR: End-to-End Object Detection with Transformers

  6. ​ Focal Loss for Dense Object Detection

  7. ​ Mask R-CNN

  8. ​ Deep Residual Learning for Image Recognition

  9. ​ Training data-efficient image transformers & distillation through attention

  10. ​ DINO: Emerging Properties in Self-Supervised Vision Transformers