Bibliography (12):

Attention Is All You Need
Reformer: The Efficient Transformer
Linformer: Self-Attention with Linear Complexity
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Sparse Sinkhorn Attention
Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers
Synthesizer: Rethinking Self-Attention in Transformer Models
Generating Long Sequences with Sparse Transformers
Longformer: The Long-Document Transformer
BigBird: Transformers for Longer Sequences
Simple Local Attentions Remain Competitive for Long-Context Tasks
Wikipedia Bibliography:
1. Pareto front