Bibliography (6):

  1. https://github.com/schwartz-lab-NLP/papa

  2. Attention Is All You Need

  3. DeBERTa: Decoding-enhanced BERT with Disentangled Attention

  4. Pay Attention to MLPs

  5. FNet: Mixing Tokens with Fourier Transforms

  6. What Does BERT Look At? An Analysis of BERT’s Attention