Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers
Synthesizer: Rethinking Self-Attention in Transformer Models
Simple Local Attentions Remain Competitive for Long-Context Tasks
Wikipedia Bibliography: