-
‘self-attention’ tag
-
LoLCATs: On Low-Rank Linearizing of Large Language Models
-
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers
-
Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers
-
RWKV: Reinventing RNNs for the Transformer Era
-
cosFormer: Rethinking Softmax in Attention
-
Self-attention Does Not Need 𝒪(n2) Memory
-
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström Method
-
On Learning the Transformer Kernel
-
A Dot Product Attention Free Transformer
-
Luna: Linear Unified Nested Attention
-
Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks (EAMLP)
-
Sub-Linear Memory: How to Make Performers SLiM
-
AFT: An Attention Free Transformer
-
LambdaNetworks: Modeling long-range Interactions without Attention
-
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
-
Linformer: Self-Attention with Linear Complexity
-
Efficient Attention: Attention with Linear Complexities
-
Efficient Attention: Attention With Linear Complexities [Blog]
-
design#future-tag-features
[Transclude the forward-link's context]
-
https://github.com/idiap/fast-transformers
-
https://manifestai.com/blogposts/faster-after-all/
-
eaed3ff82cffc0550ba85e7c9c89febbdd547d4e.html
-
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers
-
https%253A%252F%252Farxiv.org%252Fabs%252F2410.10629%2523nvidia.html
-
RWKV: Reinventing RNNs for the Transformer Era
-
https%253A%252F%252Farxiv.org%252Fabs%252F2305.13048.html
-
A Dot Product Attention Free Transformer
-
https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DJVR4JswsEM.html
-
Efficient Attention: Attention with Linear Complexities
-
https%253A%252F%252Farxiv.org%252Fabs%252F1812.01243%2523sensetime.html
-