“‘Multi-Scale Transformers’ Tag”,2021-03-01 ():
![]()
Bibliography for tag
ai/nn/transformer/attention/hierarchical, most recent first: 61 annotations & 4 links (parent).
- See Also
- Gwern
- Links
- “State-Space Models Can Learn In-Context by Gradient Descent”, et al 2024
- “XT: Nested Tokenization for Larger Context in Large Images”, et al 2024
- “A Long-Context Language Model for the Generation of Bacteriophage Genomes”, 2023
- “HGRN: Hierarchically Gated Recurrent Neural Network for Sequence Modeling”, et al 2023
- “Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer”, et al 2023
- “LongNet: Scaling Transformers to 1,000,000,000 Tokens”, et al 2023
- “Bytes Are All You Need: Transformers Operating Directly On File Bytes”, et al 2023
- “Landmark Attention: Random-Access Infinite Context Length for Transformers”, 2023
- “MEGABYTE: Predicting Million-Byte Sequences With Multiscale Transformers”, et al 2023
- “Parallel Context Windows Improve In-Context Learning of Large Language Models”, et al 2022
- “Structured Prompting: Scaling In-Context Learning to 1,000 Examples”, et al 2022
- “Efficient Transformers With Dynamic Token Pooling”, et al 2022
- “Accurate Image Restoration With Attention Retractable Transformer (ART)”, et al 2022
- “Co-Writing Screenplays and Theatre Scripts With Language Models (Dramatron): An Evaluation by Industry Professionals”, et al 2022
- “DiNAT: Dilated Neighborhood Attention Transformer”, 2022
- “Mega: Moving Average Equipped Gated Attention”, et al 2022
- “Investigating Efficiently Extending Transformers for Long Input Summarization”, et al 2022
- “ChordMixer: A Scalable Neural Attention Model for Sequences With Different Lengths”, et al 2022
- “Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better Than Dot-Product Self-Attention”, et al 2022
- “NAT: Neighborhood Attention Transformer”, et al 2022
- “ViS4mer: Long Movie Clip Classification With State-Space Video Models”, 2022
- “MaxViT: Multi-Axis Vision Transformer”, et al 2022
- “Hierarchical Perceiver”, et al 2022
- “Transformer Quality in Linear Time”, et al 2022
- “LongT5: Efficient Text-To-Text Transformer for Long Sequences”, et al 2021
- “Simple Local Attentions Remain Competitive for Long-Context Tasks”, et al 2021
- “Restormer: Efficient Transformer for High-Resolution Image Restoration”, et al 2021
- “Swin Transformer V2: Scaling Up Capacity and Resolution”, et al 2021
- “Hourglass: Hierarchical Transformers Are More Efficient Language Models”, et al 2021
- “Fastformer: Additive Attention Can Be All You Need”, et al 2021
- “AdaMRA: Adaptive Multi-Resolution Attention With Linear Complexity”, et al 2021
- “Long-Short Transformer (Transformer-LS): Efficient Transformers for Language and Vision”, et al 2021
- “Global Filter Networks for Image Classification”, et al 2021
- “HiT: Improved Transformer for High-Resolution GANs”, et al 2021
- “A Multi-Level Attention Model for Evidence-Based Fact Checking”, et al 2021
- “Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling”, et al 2021
- “Aggregating Nested Transformers”, et al 2021
- “Pay Attention to MLPs”, et al 2021
- “MViT: Multiscale Vision Transformers”, et al 2021
- “Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows”, et al 2021
- “Coordination Among Neural Modules Through a Shared Global Workspace”, et al 2021
- “Generative Adversarial Transformers”, 2021
- “LazyFormer: Self Attention With Lazy Update”, et al 2021
- “CDLM: Cross-Document Language Modeling”, et al 2021
- “Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition”, et al 2020
- “Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries”, et al 2020
- “Transformer-QL: A Step Towards Making Transformer Network Quadratically Large”, 2020
- “Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size”, et al 2020
- “Progressive Generation of Long Text”, et al 2020
- “Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing”, et al 2020
- “Conformer: Convolution-Augmented Transformer for Speech Recognition”, et al 2020
- “Multi-Scale Transformer Language Models”, et al 2020
- “Beyond 512 Tokens: Siamese Multi-Depth Transformer-Based Hierarchical Encoder for Long-Form Document Matching”, et al 2020
- “Lite Transformer With Long-Short Range Attention”, et al 2020
- “ETC: Encoding Long and Structured Inputs in Transformers”, et al 2020
- “Longformer: The Long-Document Transformer”, et al 2020
- “BP-Transformer: Modeling Long-Range Context via Binary Partitioning”, et al 2019
- “Blockwise Self-Attention for Long Document Understanding”, et al 2019
- “Hierarchical Transformers for Multi-Document Summarization”, 2019
- “Hierarchical Multiscale Recurrent Neural Networks”, et al 2016
- “A Clockwork RNN”, et al 2014
- Miscellaneous
- Bibliography