“‘Recurrent Transformers’ Tag”,2020-11-09 (; backlinks):
![]()
Bibliography for tag
ai/nn/transformer/attention/recurrent, most recent first: 1 related tag, 30 annotations, & 4 links (parent).
- See Also
- Links
- “RecurrentGemma: Moving Past Transformers for Efficient Open Language Models”, et al 2024
- “Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models”, Rannen- et al 2024
- “Transformers Are Multi-State RNNs”, et al 2024
- “Think Before You Speak: Training Language Models With Pause Tokens”, et al 2023
- “Retentive Network: A Successor to Transformer for Large Language Models”, et al 2023
- “Block-State Transformers”, et al 2023
- “Looped Transformers As Programmable Computers”, et al 2023
- “FWL: Meta-Learning Fast Weight Language Models”, et al 2022
- “Fine-Tuning Pre-Trained Transformers into Decaying Fast Weights”, 2022
- “Simple Recurrence Improves Masked Language Models”, et al 2022
- “Block-Recurrent Transformers”, et al 2022
- “S4: Efficiently Modeling Long Sequences With Structured State Spaces”, et al 2021
- “LSSL: Combining Recurrent, Convolutional, and Continuous-Time Models With Linear State-Space Layers”, et al 2021
- “Do Long-Range Language Models Actually Use Long-Range Context?”, et al 2021
- “Finetuning Pretrained Transformers into RNNs”, et al 2021
- “When Attention Meets Fast Recurrence: Training SRU++ Language Models With Reduced Compute”, 2021
- “Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Dynamic Evaluation”, et al 2021 (page 7 org deepmind)
- “Shortformer: Better Language Modeling Using Shorter Inputs”, et al 2020
- “Untangling Tradeoffs between Recurrence and Self-Attention in Neural Networks”, et al 2020
- “Addressing Some Limitations of Transformers With Feedback Memory”, et al 2020
- “DEQ: Deep Equilibrium Models”, et al 2019
- “XLNet: Generalized Autoregressive Pretraining for Language Understanding”, et al 2019
- “Dynamic Evaluation of Transformer Language Models”, et al 2019
- “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”, et al 2019
- “Transformer-XL—Combining Transformers and RNNs Into a State-Of-The-Art Language Model”, 2019
- “Universal Transformers”, et al 2018
- “Hyperbolic Attention Networks”, et al 2018
- “Improving Neural Language Models With a Continuous Cache”, et al 2016
- “Context Caching”
- joeddav
- Sort By Magic
- Miscellaneous
- Bibliography