- See Also
-
Links
- “Accurate Image Restoration With Attention Retractable Transformer (ART)”, Et Al 2022
- “DiNAT: Dilated Neighborhood Attention Transformer”, 2022
- “Co-Writing Screenplays and Theatre Scripts With Language Models (Dramatron): An Evaluation by Industry Professionals”, Et Al 2022
- “Mega: Moving Average Equipped Gated Attention”, Et Al 2022
- “Investigating Efficiently Extending Transformers for Long Input Summarization”, Et Al 2022
- “NAT: Neighborhood Attention Transformer”, Et Al 2022
- “MaxViT: Multi-Axis Vision Transformer”, Et Al 2022
- “ViS4mer: Long Movie Clip Classification With State-Space Video Models”, 2022
- “Hierarchical Perceiver”, Et Al 2022
- “Transformer Quality in Linear Time”, Et Al 2022
- “LongT5: Efficient Text-To-Text Transformer for Long Sequences”, Et Al 2021
- “Simple Local Attentions Remain Competitive for Long-Context Tasks”, Et Al 2021
- “Swin Transformer V2: Scaling Up Capacity and Resolution”, Et Al 2021
- “Restormer: Efficient Transformer for High-Resolution Image Restoration”, Et Al 2021
- “Hourglass: Hierarchical Transformers Are More Efficient Language Models”, Et Al 2021
- “Fastformer: Additive Attention Can Be All You Need”, Et Al 2021
- “Adaptive Multi-Resolution Attention With Linear Complexity”, Et Al 2021
- “Long-Short Transformer (Transformer-LS): Efficient Transformers for Language and Vision”, Et Al 2021
- “Global Filter Networks for Image Classification”, Et Al 2021
- “HiT: Improved Transformer for High-Resolution GANs”, Et Al 2021
- “Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling”, Et Al 2021
- “A Multi-Level Attention Model for Evidence-Based Fact Checking”, Et Al 2021
- “Aggregating Nested Transformers”, Et Al 2021
- “Pay Attention to MLPs”, Et Al 2021
- “Fully-Connected Neural Nets”, 2021
- “MViT: Multiscale Vision Transformers”, Et Al 2021
- “Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows”, Et Al 2021
- “Generative Adversarial Transformers”, 2021
- “Coordination Among Neural Modules Through a Shared Global Workspace”, Et Al 2021
- “LazyFormer: Self Attention With Lazy Update”, Et Al 2021
- “CDLM: Cross-Document Language Modeling”, Et Al 2021
- “Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition”, Et Al 2020
- “Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries”, Et Al 2020
- “Transformer-QL: A Step Towards Making Transformer Network Quadratically Large”, 2020
- “Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size”, Et Al 2020
- “Progressive Generation of Long Text”, Et Al 2020
- “Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing”, Et Al 2020
- “Conformer: Convolution-augmented Transformer for Speech Recognition”, Et Al 2020
- “Multi-scale Transformer Language Models”, Et Al 2020
- “Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching”, Et Al 2020
- “Lite Transformer With Long-Short Range Attention”, Et Al 2020
- “ETC: Encoding Long and Structured Inputs in Transformers”, Et Al 2020
- “Longformer: The Long-Document Transformer”, Et Al 2020
- “BP-Transformer: Modelling Long-Range Context via Binary Partitioning”, Et Al 2019
- “Blockwise Self-Attention for Long Document Understanding”, Et Al 2019
- “Hierarchical Transformers for Multi-Document Summarization”, 2019
- Miscellaneous
- Link Bibliography
See Also
Links
“Accurate Image Restoration With Attention Retractable Transformer (ART)”, Et Al 2022
“Accurate Image Restoration with Attention Retractable Transformer (ART)”, 2022-10-04 (similar)
“DiNAT: Dilated Neighborhood Attention Transformer”, 2022
“DiNAT: Dilated Neighborhood Attention Transformer”, 2022-09-29 (backlinks; similar; bibliography)
“Co-Writing Screenplays and Theatre Scripts With Language Models (Dramatron): An Evaluation by Industry Professionals”, Et Al 2022
“Co-Writing Screenplays and Theatre Scripts with Language Models (Dramatron): An Evaluation by Industry Professionals”, 2022-09-29 ( ; similar)
“Mega: Moving Average Equipped Gated Attention”, Et Al 2022
“Mega: Moving Average Equipped Gated Attention”, 2022-09-21 (similar; bibliography)
“Investigating Efficiently Extending Transformers for Long Input Summarization”, Et Al 2022
“Investigating Efficiently Extending Transformers for Long Input Summarization”, 2022-08-08 (similar)
“NAT: Neighborhood Attention Transformer”, Et Al 2022
“NAT: Neighborhood Attention Transformer”, 2022-04-14 (backlinks; similar; bibliography)
“MaxViT: Multi-Axis Vision Transformer”, Et Al 2022
“MaxViT: Multi-Axis Vision Transformer”, 2022-04-04 ( ; similar)
“ViS4mer: Long Movie Clip Classification With State-Space Video Models”, 2022
“ViS4mer: Long Movie Clip Classification with State-Space Video Models”, 2022-04-04 ( ; similar)
“Hierarchical Perceiver”, Et Al 2022
“Hierarchical Perceiver”, 2022-02-22 ( ; similar)
“Transformer Quality in Linear Time”, Et Al 2022
“Transformer Quality in Linear Time”, 2022-02-21 (similar)
“LongT5: Efficient Text-To-Text Transformer for Long Sequences”, Et Al 2021
“LongT5: Efficient Text-To-Text Transformer for Long Sequences”, 2021-12-15 ( ; similar; bibliography)
“Simple Local Attentions Remain Competitive for Long-Context Tasks”, Et Al 2021
“Simple Local Attentions Remain Competitive for Long-Context Tasks”, 2021-12-14 (similar)
“Swin Transformer V2: Scaling Up Capacity and Resolution”, Et Al 2021
“Swin Transformer V2: Scaling Up Capacity and Resolution”, 2021-11-18 ( ; backlinks; similar; bibliography)
“Restormer: Efficient Transformer for High-Resolution Image Restoration”, Et Al 2021
“Restormer: Efficient Transformer for High-Resolution Image Restoration”, 2021-11-18 ( ; similar)
“Hourglass: Hierarchical Transformers Are More Efficient Language Models”, Et Al 2021
“Hourglass: Hierarchical Transformers Are More Efficient Language Models”, 2021-10-26 (similar)
“Fastformer: Additive Attention Can Be All You Need”, Et Al 2021
“Fastformer: Additive Attention Can Be All You Need”, 2021-08-20 (backlinks; similar)
“Adaptive Multi-Resolution Attention With Linear Complexity”, Et Al 2021
“Adaptive Multi-Resolution Attention with Linear Complexity”, 2021-08-10 (backlinks; similar)
“Long-Short Transformer (Transformer-LS): Efficient Transformers for Language and Vision”, Et Al 2021
“Long-Short Transformer (Transformer-LS): Efficient Transformers for Language and Vision”, 2021-07-05 (similar; bibliography)
“Global Filter Networks for Image Classification”, Et Al 2021
“Global Filter Networks for Image Classification”, 2021-07-01 (backlinks; similar; bibliography)
“HiT: Improved Transformer for High-Resolution GANs”, Et Al 2021
“HiT: Improved Transformer for High-Resolution GANs”, 2021-06-14 ( ; similar; bibliography)
“Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling”, Et Al 2021
“Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling”, 2021-06-02 (backlinks; similar)
“A Multi-Level Attention Model for Evidence-Based Fact Checking”, Et Al 2021
“A Multi-Level Attention Model for Evidence-Based Fact Checking”, 2021-06-02 ( ; backlinks; similar)
“Aggregating Nested Transformers”, Et Al 2021
“Aggregating Nested Transformers”, 2021-05-26 (similar)
“Pay Attention to MLPs”, Et Al 2021
“Pay Attention to MLPs”, 2021-05-17 ( ; similar; bibliography)
“Fully-Connected Neural Nets”, 2021
“Fully-Connected Neural Nets”, 2021-04-24 ( ; backlinks; similar; bibliography)
“MViT: Multiscale Vision Transformers”, Et Al 2021
“MViT: Multiscale Vision Transformers”, 2021-04-22 (similar; bibliography)
“Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows”, Et Al 2021
“Swin Transformer: Hierarchical Vision Transformer using Shifted Windows”, 2021-03-25 (backlinks; similar; bibliography)
“Generative Adversarial Transformers”, 2021
“Generative Adversarial Transformers”, 2021-03-01 ( ; backlinks; similar)
“Coordination Among Neural Modules Through a Shared Global Workspace”, Et Al 2021
“Coordination Among Neural Modules Through a Shared Global Workspace”, 2021-03-01 ( ; backlinks; similar)
“LazyFormer: Self Attention With Lazy Update”, Et Al 2021
“LazyFormer: Self Attention with Lazy Update”, 2021-02-25 (similar)
“CDLM: Cross-Document Language Modeling”, Et Al 2021
“CDLM: Cross-Document Language Modeling”, 2021-01-02 ( ; backlinks; similar)
“Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition”, Et Al 2020
“Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition”, 2020-10-20 ( ; similar; bibliography)
“Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries”, Et Al 2020
“Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries”, 2020-10-14 ( ; backlinks; similar)
“Transformer-QL: A Step Towards Making Transformer Network Quadratically Large”, 2020
“Transformer-QL: A Step Towards Making Transformer Network Quadratically Large”, 2020-09-28 (backlinks; similar)
“Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size”, Et Al 2020
“Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size”, 2020-08-16 ( ; backlinks; similar)
“Progressive Generation of Long Text”, Et Al 2020
“Progressive Generation of Long Text”, 2020-06-28 ( ; backlinks; similar)
“Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing”, Et Al 2020
“Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing”, 2020-06-05 ( ; backlinks; similar)
“Conformer: Convolution-augmented Transformer for Speech Recognition”, Et Al 2020
“Conformer: Convolution-augmented Transformer for Speech Recognition”, 2020-05-16 (similar)
“Multi-scale Transformer Language Models”, Et Al 2020
“Multi-scale Transformer Language Models”, 2020-05-01 (similar)
“Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching”, Et Al 2020
“Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching”, 2020-04-26 (similar)
“Lite Transformer With Long-Short Range Attention”, Et Al 2020
“Lite Transformer with Long-Short Range Attention”, 2020-04-24 ( ; backlinks; similar)
“ETC: Encoding Long and Structured Inputs in Transformers”, Et Al 2020
“ETC: Encoding Long and Structured Inputs in Transformers”, 2020-04-17 (backlinks; similar)
“Longformer: The Long-Document Transformer”, Et Al 2020
“Longformer: The Long-Document Transformer”, 2020-04-10 (backlinks; similar)
“BP-Transformer: Modelling Long-Range Context via Binary Partitioning”, Et Al 2019
“BP-Transformer: Modelling Long-Range Context via Binary Partitioning”, 2019-11-11 (backlinks; similar)
“Blockwise Self-Attention for Long Document Understanding”, Et Al 2019
“Blockwise Self-Attention for Long Document Understanding”, 2019-11-07 (similar)
“Hierarchical Transformers for Multi-Document Summarization”, 2019
“Hierarchical Transformers for Multi-Document Summarization”, 2019-05-30 (backlinks; similar)
Miscellaneous
Link Bibliography
-
https://arxiv.org/abs/2209.15001
: “DiNAT: Dilated Neighborhood Attention Transformer”, Ali Hassani, Humphrey Shi: -
https://arxiv.org/abs/2209.10655
: “Mega: Moving Average Equipped Gated Attention”, Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, Luke Zettlemoyer: -
https://arxiv.org/abs/2204.07143
: “NAT: Neighborhood Attention Transformer”, Ali Hassani, Steven Walton, Jiachen Li, Shen Li, Humphrey Shi: -
https://arxiv.org/abs/2112.07916#google
: “LongT5: Efficient Text-To-Text Transformer for Long Sequences”, Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang: -
https://arxiv.org/abs/2111.09883
: “Swin Transformer V2: Scaling Up Capacity and Resolution”, : -
https://arxiv.org/abs/2107.02192#nvidia
: “Long-Short Transformer (Transformer-LS): Efficient Transformers for Language and Vision”, Chen Zhu, Wei Ping, Chaowei Xiao, Mohammad Shoeybi, Tom Goldstein, Anima Anandkumar, Bryan Catanzaro: -
https://arxiv.org/abs/2107.00645
: “Global Filter Networks for Image Classification”, Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, Jie Zhou: -
https://arxiv.org/abs/2106.07631#google
: “HiT: Improved Transformer for High-Resolution GANs”, Long Zhao, Zizhao Zhang, Ting Chen, Dimitris N. Metaxas, Han Zhang: -
https://arxiv.org/abs/2105.08050#google
: “Pay Attention to MLPs”, Hanxiao Liu, Zihang Dai, David R. So, Quoc V. Le: -
fc
: “Fully-Connected Neural Nets”, Gwern Branwen: -
https://arxiv.org/abs/2104.11227#facebook
: “MViT: Multiscale Vision Transformers”, Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, Christoph Feichtenhofer: -
https://arxiv.org/abs/2103.14030
: “Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows”, Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo: -
https://arxiv.org/abs/2010.10504#google
: “Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition”, Yu Zhang, James Qin, Daniel S. Park, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Quoc V. Le, Yonghui Wu: