Bibliography (12):

  1. Attention Is All You Need

  2. Reformer: The Efficient Transformer

  3. Linformer: Self-Attention with Linear Complexity

  4. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

  5. Sparse Sinkhorn Attention

  6. Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers

  7. Synthesizer: Rethinking Self-Attention in Transformer Models

  8. Generating Long Sequences with Sparse Transformers

  9. Longformer: The Long-Document Transformer

  10. BigBird: Transformers for Longer Sequences

  11. Simple Local Attentions Remain Competitive for Long-Context Tasks

  12. Wikipedia Bibliography:

    1. Pareto front