‘Transformer matrix optimizations’ tag
- See Also
-
Links
- “LoLCATs: On Low-Rank Linearizing of Large Language Models”, Zhang et al 2024
- “SANA: Efficient High-Resolution Image Synthesis With Linear Diffusion Transformers”, Xie et al 2024
- “Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers”, Gu et al 2024
- “RWKV: Reinventing RNNs for the Transformer Era”, Peng et al 2023
- “CosFormer: Rethinking Softmax in Attention”, Qin et al 2022
- “Self-Attention Does Not Need 𝒪(n2) Memory”, Rabe & Staats 2021
- “Skyformer: Remodel Self-Attention With Gaussian Kernel and Nyström Method”, Chen et al 2021
- “On Learning the Transformer Kernel”, Chowdhury et al 2021
- “A Dot Product Attention Free Transformer”, Zhai et al 2021
- “Luna: Linear Unified Nested Attention”, Ma et al 2021
- “Beyond Self-Attention: External Attention Using Two Linear Layers for Visual Tasks (EAMLP)”, Guo et al 2021
- “Sub-Linear Memory: How to Make Performers SLiM”, Likhosherstov et al 2020
- “AFT: An Attention Free Transformer”, Anonymous 2020
- “LambdaNetworks: Modeling Long-Range Interactions without Attention”, Bello 2020
- “Transformers Are RNNs: Fast Autoregressive Transformers With Linear Attention”, Katharopoulos et al 2020
- “Linformer: Self-Attention With Linear Complexity”, Wang et al 2020
- “Efficient Attention: Attention With Linear Complexities”, Shen et al 2018
- “Efficient Attention: Attention With Linear Complexities [Blog]”
- Sort By Magic
- Miscellaneous
- Bibliography
See Also
Links
“LoLCATs: On Low-Rank Linearizing of Large Language Models”, Zhang et al 2024
“SANA: Efficient High-Resolution Image Synthesis With Linear Diffusion Transformers”, Xie et al 2024
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers
“Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers”, Gu et al 2024
“RWKV: Reinventing RNNs for the Transformer Era”, Peng et al 2023
“CosFormer: Rethinking Softmax in Attention”, Qin et al 2022
“Self-Attention Does Not Need 𝒪(n2) Memory”, Rabe & Staats 2021
“Skyformer: Remodel Self-Attention With Gaussian Kernel and Nyström Method”, Chen et al 2021
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström Method
“On Learning the Transformer Kernel”, Chowdhury et al 2021
“A Dot Product Attention Free Transformer”, Zhai et al 2021
“Luna: Linear Unified Nested Attention”, Ma et al 2021
“Beyond Self-Attention: External Attention Using Two Linear Layers for Visual Tasks (EAMLP)”, Guo et al 2021
Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks (EAMLP)
“Sub-Linear Memory: How to Make Performers SLiM”, Likhosherstov et al 2020
“AFT: An Attention Free Transformer”, Anonymous 2020
“LambdaNetworks: Modeling Long-Range Interactions without Attention”, Bello 2020
LambdaNetworks: Modeling long-range Interactions without Attention
“Transformers Are RNNs: Fast Autoregressive Transformers With Linear Attention”, Katharopoulos et al 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
“Linformer: Self-Attention With Linear Complexity”, Wang et al 2020
“Efficient Attention: Attention With Linear Complexities”, Shen et al 2018
“Efficient Attention: Attention With Linear Complexities [Blog]”
Efficient Attention: Attention with Linear Complexities [blog]
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
efficient-attention
attention-free
linear-attention
Miscellaneous
Bibliography
-
https://arxiv.org/abs/2410.10629#nvidia
: “SANA: Efficient High-Resolution Image Synthesis With Linear Diffusion Transformers”, -
https://arxiv.org/abs/2305.13048
: “RWKV: Reinventing RNNs for the Transformer Era”, -
https://openreview.net/forum?id=JVR4JswsEM
: “A Dot Product Attention Free Transformer”, -
https://arxiv.org/abs/1812.01243#sensetime
: “Efficient Attention: Attention With Linear Complexities”,