‘Transformer matrix optimizations’ directory

See Also
Links
Miscellaneous
Bibliography

See Also

Parent (‘self-attention’ tag)

Links

“EvaByte: Efficient Byte-Level Language Models at Scale: Introducing EvaByte, an Efficient and Strong Byte-Level Language Model ”, Zheng et al 2025

EvaByte: Efficient Byte-level Language Models at Scale: Introducing EvaByte, an efficient and strong byte-level language model :

View HTML:

/doc/www/hkunlp.github.io/0ee1ab2d100f11caa6808098ac2c61531e0b5db0.html

“LoLCATs: On Low-Rank Linearizing of Large Language Models ”, Zhang et al 2024

LoLCATs: On Low-Rank Linearizing of Large Language Models

“SANA: Efficient High-Resolution Image Synthesis With Linear Diffusion Transformers ”, Xie et al 2024

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

“Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers ”, Gu et al 2024

Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers

“RWKV: Reinventing RNNs for the Transformer Era ”, Peng et al 2023

RWKV: Reinventing RNNs for the Transformer Era

“CosFormer: Rethinking Softmax in Attention ”, Qin et al 2022

cosFormer: Rethinking Softmax in Attention

“Self-Attention Does Not Need 𝒪(n²) Memory ”, Rabe & Staats 2021

Self-attention Does Not Need 𝒪(n²) Memory

“Skyformer: Remodel Self-Attention With Gaussian Kernel and Nyström Method ”, Chen et al 2021

Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström Method

“On Learning the Transformer Kernel ”, Chowdhury et al 2021

On Learning the Transformer Kernel

“A Dot Product Attention Free Transformer ”, Zhai et al 2021

A Dot Product Attention Free Transformer

“AFT: An Attention Free Transformer ”, Zhai et al 2021

AFT: An Attention Free Transformer

“Luna: Linear Unified Nested Attention ”, Ma et al 2021

Luna: Linear Unified Nested Attention

“Beyond Self-Attention: External Attention Using Two Linear Layers for Visual Tasks (EAMLP) ”, Guo et al 2021

Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks (EAMLP)

“Sub-Linear Memory: How to Make Performers SLiM ”, Likhosherstov et al 2020

Sub-Linear Memory: How to Make Performers SLiM

“LambdaNetworks: Modeling Long-Range Interactions without Attention ”, Bello 2020

LambdaNetworks: Modeling long-range Interactions without Attention

“Transformers Are RNNs: Fast Autoregressive Transformers With Linear Attention ”, Katharopoulos et al 2020

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

“Linformer: Self-Attention With Linear Complexity ”, Wang et al 2020

Linformer: Self-Attention with Linear Complexity

“Efficient Attention: Attention With Linear Complexities ”, Shen et al 2018

Efficient Attention: Attention with Linear Complexities

“Efficient Attention: Attention With Linear Complexities [Blog] ”

Efficient Attention: Attention with Linear Complexities [blog]

Sort By Magic

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

`efficient-attention`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`attention-free`

[see previous entry]

[see previous entry]

[see previous entry]

`linear-attention`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

Miscellaneous

https://github.com/idiap/fast-transformers
https://manifestai.com/blogposts/faster-after-all/ :

View HTML:

/doc/www/manifestai.com/eaed3ff82cffc0550ba85e7c9c89febbdd547d4e.html

Bibliography

https://arxiv.org/abs/2410.10629#nvidia: “SANA: Efficient High-Resolution Image Synthesis With Linear Diffusion Transformers ”, Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, Song Han

link-bibliography
https://arxiv.org/abs/2305.13048: “RWKV: Reinventing RNNs for the Transformer Era ”, Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Huanqi Cao, Xin Cheng, Michael Chung, Matteo Grella, Kranthi Kiran GV, Xuzheng He, Haowen Hou, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartlomiej Koptyra, Hayden Lau, Krishna Sri Ipsit Mantri, Ferdin, Mom, Atsushi Saito, Xiangru Tang, Bolun Wang, Johan S. Wind, Stansilaw Wozniak, Ruichong Zhang, Zhenyuan Zhang, Qihang Zhao, Peng Zhou, Jian Zhu, Rui-Jie Zhu

link-bibliography
https://openreview.net/forum?id=JVR4JswsEM: “A Dot Product Attention Free Transformer ”, Shuangfei Zhai, Walter Talbott, Nitish Srivastava, Chen Huang, Hanlin Goh, Ruixiang ZHANG, Joshua M. Susskind

link-bibliography
https://arxiv.org/abs/1812.01243#sensetime: “Efficient Attention: Attention With Linear Complexities ”, Zhuoran Shen, Mingyuan Zhang, Haiyu Zhao, Shuai Yi, Hongsheng Li

link-bibliography

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]