Bibliography (14):

‘end-to-end’ directory
Attention Is All You Need
index#mlp-mixer

[Transclude the forward-link's context]
MLP-Mixer: An all-MLP Architecture for Vision
DETR: End-to-End Object Detection with Transformers
Focal Loss for Dense Object Detection
Mask R-CNN
Deep Residual Learning for Image Recognition
Training data-efficient image transformers & distillation through attention
DINO: Emerging Properties in Self-Supervised Vision Transformers
Wikipedia Bibliography:
1. Convolutional neural network
2. Object detection
3. Image segmentation
4. Weak supervision § Semi-supervised learning