Bibliography (11):

  1. Replacing softmax with ReLU in Vision Transformers

  2. Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale

  3. Pay Less Attention with Lightweight and Dynamic Convolutions

  4. Attention Is All You Need

  5. Layer Normalization