Bibliography (6):
https://github.com/schwartz-lab-NLP/papa
Attention Is All You Need
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Pay Attention to MLPs
FNet: Mixing Tokens with Fourier Transforms
What Does BERT Look At? An Analysis of BERT’s Attention