- See Also
-
Links
- “CosFormer: Rethinking Softmax in Attention”, Et Al 2022
- “Self-attention Does Not Need 𝒪 (N2) Memory”, 2021
- “Skyformer: Remodel Self-Attention With Gaussian Kernel and Nyström Method”, Et Al 2021
- “On Learning the Transformer Kernel”, Et Al 2021
- “A Dot Product Attention Free Transformer”, Et Al 2021
- “Luna: Linear Unified Nested Attention”, Et Al 2021
- “Beyond Self-attention: External Attention Using Two Linear Layers for Visual Tasks (EAMLP)”, Et Al 2021
- “Sub-Linear Memory: How to Make Performers SLiM”, Et Al 2020
- “LambdaNetworks: Modeling Long-range Interactions without Attention”, 2020
- “AFT: An Attention Free Transformer”, 2020
- “Transformers Are RNNs: Fast Autoregressive Transformers With Linear Attention”, Et Al 2020
- “Linformer: Self-Attention With Linear Complexity”, Et Al 2020
- “Efficient Attention: Attention With Linear Complexities”, Et Al 2018
- “Efficient Attention: Attention With Linear Complexities [blog]”
See Also
Links
“CosFormer: Rethinking Softmax in Attention”, Et Al 2022
“cosFormer: Rethinking Softmax in Attention”, 2022-02-17 (similar)
“Self-attention Does Not Need 𝒪 (N2) Memory”, 2021
“Self-attention Does Not Need 𝒪 (n2) Memory”, 2021-12-10 (similar)
“Skyformer: Remodel Self-Attention With Gaussian Kernel and Nyström Method”, Et Al 2021
“Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström Method”, 2021-10-29 (backlinks; similar)
“On Learning the Transformer Kernel”, Et Al 2021
“On Learning the Transformer Kernel”, 2021-10-15 (backlinks; similar)
“A Dot Product Attention Free Transformer”, Et Al 2021
“A Dot Product Attention Free Transformer”, 2021-10-05 (backlinks; similar)
“Luna: Linear Unified Nested Attention”, Et Al 2021
“Luna: Linear Unified Nested Attention”, 2021-06-03 (backlinks; similar)
“Beyond Self-attention: External Attention Using Two Linear Layers for Visual Tasks (EAMLP)”, Et Al 2021
“Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks (EAMLP)”, 2021-05-05 (backlinks; similar)
“Sub-Linear Memory: How to Make Performers SLiM”, Et Al 2020
“Sub-Linear Memory: How to Make Performers SLiM”, 2020-12-21 (backlinks; similar)
“LambdaNetworks: Modeling Long-range Interactions without Attention”, 2020
“LambdaNetworks: Modeling long-range Interactions without Attention”, 2020-09-28 (backlinks; similar)
“AFT: An Attention Free Transformer”, 2020
“AFT: An Attention Free Transformer”, 2020-09-28 ( ; similar)
“Transformers Are RNNs: Fast Autoregressive Transformers With Linear Attention”, Et Al 2020
“Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention”, 2020-06-29 ( ; backlinks; similar)
“Linformer: Self-Attention With Linear Complexity”, Et Al 2020
“Linformer: Self-Attention with Linear Complexity”, 2020-06-08 ( ; similar)
“Efficient Attention: Attention With Linear Complexities”, Et Al 2018
“Efficient Attention: Attention with Linear Complexities”, 2018-12-04 (similar)