- See Also
-
Links
- “Retentive Network: A Successor to Transformer for Large Language Models”, Sun et al 2023
- “FWL: Meta-Learning Fast Weight Language Models”, Clark et al 2022
- “Fine-Tuning Pre-trained Transformers into Decaying Fast Weights”, Mao 2022
- “Simple Recurrence Improves Masked Language Models”, Lei et al 2022
- “Block-Recurrent Transformers”, Hutchins et al 2022
- “S4: Efficiently Modeling Long Sequences With Structured State Spaces”, Gu et al 2021
- “LSSL: Combining Recurrent, Convolutional, and Continuous-time Models With Linear State-Space Layers”, Gu et al 2021
- “Do Long-Range Language Models Actually Use Long-Range Context?”, Sun et al 2021
- “Finetuning Pretrained Transformers into RNNs”, Kasai et al 2021
- “When Attention Meets Fast Recurrence: Training SRU++ Language Models With Reduced Compute”, Lei 2021
- “Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Dynamic Evaluation”, Lazaridou et al 2021 (page 7 org deepmind)
- “Shortformer: Better Language Modeling Using Shorter Inputs”, Press et al 2020
- “Untangling Tradeoffs between Recurrence and Self-attention in Neural Networks”, Kerg et al 2020
- “Addressing Some Limitations of Transformers With Feedback Memory”, Fan et al 2020
- “DEQ: Deep Equilibrium Models”, Bai et al 2019
- “XLNet: Generalized Autoregressive Pretraining for Language Understanding”, Yang et al 2019
- “Dynamic Evaluation of Transformer Language Models”, Krause et al 2019
- “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”, Dai et al 2019
- “Transformer-XL—Combining Transformers and RNNs Into a State-of-the-art Language Model”, Horev 2019
- “Universal Transformers”, Dehghani et al 2018
- “Hyperbolic Attention Networks”, Gulcehre et al 2018
- “Improving Neural Language Models With a Continuous Cache”, Grave et al 2016
- joeddav
- Sort By Magic
- Miscellaneous
- Link Bibliography
See Also
Links
“Retentive Network: A Successor to Transformer for Large Language Models”, Sun et al 2023
“Retentive Network: A Successor to Transformer for Large Language Models”
“FWL: Meta-Learning Fast Weight Language Models”, Clark et al 2022
“Fine-Tuning Pre-trained Transformers into Decaying Fast Weights”, Mao 2022
“Fine-Tuning Pre-trained Transformers into Decaying Fast Weights”
“Simple Recurrence Improves Masked Language Models”, Lei et al 2022
“Block-Recurrent Transformers”, Hutchins et al 2022
“S4: Efficiently Modeling Long Sequences With Structured State Spaces”, Gu et al 2021
“S4: Efficiently Modeling Long Sequences with Structured State Spaces”
“LSSL: Combining Recurrent, Convolutional, and Continuous-time Models With Linear State-Space Layers”, Gu et al 2021
“Do Long-Range Language Models Actually Use Long-Range Context?”, Sun et al 2021
“Do Long-Range Language Models Actually Use Long-Range Context?”
“Finetuning Pretrained Transformers into RNNs”, Kasai et al 2021
“When Attention Meets Fast Recurrence: Training SRU++ Language Models With Reduced Compute”, Lei 2021
“When Attention Meets Fast Recurrence: Training SRU++ Language Models with Reduced Compute”
“Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Dynamic Evaluation”, Lazaridou et al 2021 (page 7 org deepmind)
“Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Dynamic Evaluation”
“Shortformer: Better Language Modeling Using Shorter Inputs”, Press et al 2020
“Shortformer: Better Language Modeling using Shorter Inputs”
“Untangling Tradeoffs between Recurrence and Self-attention in Neural Networks”, Kerg et al 2020
“Untangling tradeoffs between recurrence and self-attention in neural networks”
“Addressing Some Limitations of Transformers With Feedback Memory”, Fan et al 2020
“Addressing Some Limitations of Transformers with Feedback Memory”
“DEQ: Deep Equilibrium Models”, Bai et al 2019
“XLNet: Generalized Autoregressive Pretraining for Language Understanding”, Yang et al 2019
“XLNet: Generalized Autoregressive Pretraining for Language Understanding”
“Dynamic Evaluation of Transformer Language Models”, Krause et al 2019
“Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”, Dai et al 2019
“Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”
“Transformer-XL—Combining Transformers and RNNs Into a State-of-the-art Language Model”, Horev 2019
“Transformer-XL—Combining Transformers and RNNs Into a State-of-the-art Language Model”
“Universal Transformers”, Dehghani et al 2018
“Hyperbolic Attention Networks”, Gulcehre et al 2018
“Improving Neural Language Models With a Continuous Cache”, Grave et al 2016
joeddav
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
languagemodeling
language-models
transformer-evolution
Miscellaneous
Link Bibliography
-
https://arxiv.org/abs/2212.02475#google
: “FWL: Meta-Learning Fast Weight Language Models”, Kevin Clark, Kelvin Guu, Ming-Wei Chang, Panupong Pasupat, Geoffrey Hinton, Mohammad Norouzi -
https://arxiv.org/abs/2210.04243
: “Fine-Tuning Pre-trained Transformers into Decaying Fast Weights”, Huanru Henry Mao -
https://arxiv.org/abs/2203.07852
: “Block-Recurrent Transformers”, DeLesley Hutchins, Imanol Schlag, Yuhuai Wu, Ethan Dyer, Behnam Neyshabur -
https://arxiv.org/abs/2111.00396
: “S4: Efficiently Modeling Long Sequences With Structured State Spaces”, Albert Gu, Karan Goel, Christopher Ré -
https://arxiv.org/abs/2109.09115
: “Do Long-Range Language Models Actually Use Long-Range Context?”, Simeng Sun, Kalpesh Krishna, Andrew Mattarella-Micke, Mohit Iyyer -
https://arxiv.org/pdf/2102.01951.pdf#page=7&org=deepmind
: “Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Dynamic Evaluation”, -
https://arxiv.org/abs/1904.08378
: “Dynamic Evaluation of Transformer Language Models”, Ben Krause, Emmanuel Kahembwe, Iain Murray, Steve Renals