“‘Sparse Transformers’ Tag”,2020-11-29 ():
![]()
Bibliography for tag
ai/nn/transformer/attention/sparsity, most recent first: 2 related tags, 40 annotations, & 4 links (parent).
- See Also
- Links
- “When Parts Are Greater Than Sums: Individual LLM Components Can Outperform Full Models”, et al 2024
- “AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What Goes on in Artificial Neural Networks Work Is Largely a Mystery, Even to Their Creators. But Researchers from Anthropic Have Caught a Glimpse”, 2024
- “Revisiting the Equivalence of In-Context Learning and Gradient Descent: The Impact of Data Distribution”, et al 2024
- “Zoology: Measuring and Improving Recall in Efficient Language Models”, et al 2023
- “HyperAttention: Long-Context Attention in Near-Linear Time”, et al 2023
- “LongLoRA: Efficient Fine-Tuning of Long-Context Large Language Models”, et al 2023
- “H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models”, et al 2023
- “Unlimiformer: Long-Range Transformers With Unlimited Length Input”, et al 2023
- “How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers”, et al 2022
- “Scaling Laws vs Model Architectures: How Does Inductive Bias Influence Scaling?”, et al 2022
- “Random Feature Attention”, et al 2022
- “Sparse Is Enough in Scaling Transformers”, et al 2021
- “You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling”, et al 2021
- “Scatterbrain: Unifying Sparse and Low-Rank Attention Approximation”, et al 2021
- “Combiner: Full Attention Transformer With Sparse Computation Cost”, et al 2021
- “OmniNet: Omnidirectional Representations from Transformers”, et al 2021
- “Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention”, et al 2021
- “Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting”, et al 2020
- “SMYRF: Efficient Attention Using Asymmetric Clustering”, et al 2020
- “FAVOR+: Rethinking Attention With Performers”, et al 2020
- “Cluster-Former: Clustering-Based Sparse Transformer for Long-Range Dependency Encoding”, et al 2020
- “DeepSpeed Sparse Attention”, 2020
- “BigBird: Transformers for Longer Sequences”, et al 2020
- “Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation”, et al 2020
- “Efficient Content-Based Sparse Attention With Routing Transformers”, et al 2020
- “Sparse Sinkhorn Attention”, et al 2020
- “Reformer: The Efficient Transformer”, et al 2020
- “The Reformer—Pushing the Limits of Language Modeling”, 2020
- “Axial Attention in Multidimensional Transformers”, et al 2019
- “Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting”, et al 2019
- “Scaling Autoregressive Video Models”, et al 2019
- “Adaptive Attention Span in Transformers”, et al 2019
- “Generating Long Sequences With Sparse Transformers”, et al 2019
- “Generative Modeling With Sparse Transformers: We’ve Developed the Sparse Transformer, a Deep Neural Network Which Sets New Records at Predicting What Comes next in a Sequence—Whether Text, Images, or Sound. It Uses an Algorithmic Improvement of the attention Mechanism to Extract Patterns from Sequences 30× Longer Than Possible Previously”, 2019
- “Star-Transformer”, et al 2019
- “CCNet: Criss-Cross Attention for Semantic Segmentation”, et al 2018
- “Image Transformer”, et al 2018
- “Constructing Transformers For Longer Sequences With Sparse Attention Methods”
- “A Deep Dive into the Reformer”
- “Optimal Transport and the Sinkhorn Transformer”
- Sort By Magic
- Miscellaneous
- Bibliography