- See Also
-
Links
- “RWKV: Reinventing RNNs for the Transformer Era”, Peng et al 2023
- “CosFormer: Rethinking Softmax in Attention”, Qin et al 2022
- “Self-attention Does Not Need 𝒪(n2) Memory”, Rabe & Staats 2021
- “Skyformer: Remodel Self-Attention With Gaussian Kernel and Nyström Method”, Chen et al 2021
- “On Learning the Transformer Kernel”, Chowdhury et al 2021
- “A Dot Product Attention Free Transformer”, Zhai et al 2021
- “Luna: Linear Unified Nested Attention”, Ma et al 2021
- “Beyond Self-attention: External Attention Using Two Linear Layers for Visual Tasks (EAMLP)”, Guo et al 2021
- “Sub-Linear Memory: How to Make Performers SLiM”, Likhosherstov et al 2020
- “LambdaNetworks: Modeling Long-range Interactions without Attention”, Bello 2020
- “AFT: An Attention Free Transformer”, Anonymous 2020
- “Transformers Are RNNs: Fast Autoregressive Transformers With Linear Attention”, Katharopoulos et al 2020
- “Linformer: Self-Attention With Linear Complexity”, Wang et al 2020
- “Efficient Attention: Attention With Linear Complexities”, Shen et al 2018
- “Efficient Attention: Attention With Linear Complexities [blog]”
- Sort By Magic
- Miscellaneous
- Link Bibliography
See Also
Links
“RWKV: Reinventing RNNs for the Transformer Era”, Peng et al 2023
“CosFormer: Rethinking Softmax in Attention”, Qin et al 2022
“Self-attention Does Not Need 𝒪(n2) Memory”, Rabe & Staats 2021
“Skyformer: Remodel Self-Attention With Gaussian Kernel and Nyström Method”, Chen et al 2021
“Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström Method”
“On Learning the Transformer Kernel”, Chowdhury et al 2021
“A Dot Product Attention Free Transformer”, Zhai et al 2021
“Luna: Linear Unified Nested Attention”, Ma et al 2021
“Beyond Self-attention: External Attention Using Two Linear Layers for Visual Tasks (EAMLP)”, Guo et al 2021
“Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks (EAMLP)”
“Sub-Linear Memory: How to Make Performers SLiM”, Likhosherstov et al 2020
“LambdaNetworks: Modeling Long-range Interactions without Attention”, Bello 2020
“LambdaNetworks: Modeling long-range Interactions without Attention”
“AFT: An Attention Free Transformer”, Anonymous 2020
“Transformers Are RNNs: Fast Autoregressive Transformers With Linear Attention”, Katharopoulos et al 2020
“Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention”
“Linformer: Self-Attention With Linear Complexity”, Wang et al 2020
“Efficient Attention: Attention With Linear Complexities”, Shen et al 2018
“Efficient Attention: Attention With Linear Complexities [blog]”
“Efficient Attention: Attention with Linear Complexities [blog]”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
attention-free
linear-attention
autoregressive
Miscellaneous
Link Bibliography
-
https://arxiv.org/abs/2305.13048
: “RWKV: Reinventing RNNs for the Transformer Era”, -
https://openreview.net/forum?id=JVR4JswsEM
: “A Dot Product Attention Free Transformer”, Shuangfei Zhai, Walter Talbott, Nitish Srivastava, Chen Huang, Hanlin Goh, Ruixiang ZHANG, Joshua M. Susskind