- See Also
- Gwern
-
Links
- “Neural Spline Fields for Burst Image Fusion and Layer Separation”, Chugunov et al 2023
- “Test-Time Adaptation of Discriminative Models via Diffusion Generative Feedback”, Prabhudesai et al 2023
- “In-Context Pretraining (ICP): Language Modeling Beyond Document Boundaries”, Shi et al 2023
- “OSD: Online Speculative Decoding”, Liu et al 2023
- “Test-Time Training on Video Streams”, Wang et al 2023
- “TTT-NN: Test-Time Training on Nearest Neighbors for Large Language Models”, Hardt & Sun 2023
- “FWL: Meta-Learning Fast Weight Language Models”, Clark et al 2022
- “Don’t Stop the Training: Continuously-Updating Self-Supervised Algorithms Best Account for Auditory Responses in the Cortex”, Orhan et al 2022
- “Reconsidering the Past: Optimizing Hidden States in Language Models”, Yoshida & Gimpel 2021
- “Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Dynamic Evaluation”, Lazaridou et al 2021 (page 7 org deepmind)
- “Mogrifier LSTM”, Melis et al 2019
- “Dynamic Evaluation of Transformer Language Models”, Krause et al 2019
- “Dynamic Evaluation of Neural Sequence Models”, Krause et al 2017
- “Bayesian Recurrent Neural Networks”, Fortunato et al 2017
- “Learning Simpler Language Models With the Differential State Framework”, II et al 2017
- “Neural Episodic Control”, Pritzel et al 2017
- “Multiplicative LSTM for Sequence Modelling”, Krause et al 2016
- “Generating Sequences With Recurrent Neural Networks”, Graves 2013
- “Recurrent Neural Network Based Language Model § Dynamic Evaluation”, Mikolov et al 2010 (page 2)
- “Fast Text Compression With Neural Networks”, Mahoney 2000
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Gwern
“Nenex: A Neural Personal Wiki Idea”, Gwern 2023
Links
“Neural Spline Fields for Burst Image Fusion and Layer Separation”, Chugunov et al 2023
Neural Spline Fields for Burst Image Fusion and Layer Separation
“Test-Time Adaptation of Discriminative Models via Diffusion Generative Feedback”, Prabhudesai et al 2023
Test-time Adaptation of Discriminative Models via Diffusion Generative Feedback
“In-Context Pretraining (ICP): Language Modeling Beyond Document Boundaries”, Shi et al 2023
In-Context Pretraining (ICP): Language Modeling Beyond Document Boundaries
“OSD: Online Speculative Decoding”, Liu et al 2023
“Test-Time Training on Video Streams”, Wang et al 2023
“TTT-NN: Test-Time Training on Nearest Neighbors for Large Language Models”, Hardt & Sun 2023
TTT-NN: Test-Time Training on Nearest Neighbors for Large Language Models
“FWL: Meta-Learning Fast Weight Language Models”, Clark et al 2022
“Don’t Stop the Training: Continuously-Updating Self-Supervised Algorithms Best Account for Auditory Responses in the Cortex”, Orhan et al 2022
“Reconsidering the Past: Optimizing Hidden States in Language Models”, Yoshida & Gimpel 2021
Reconsidering the Past: Optimizing Hidden States in Language Models
“Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Dynamic Evaluation”, Lazaridou et al 2021 (page 7 org deepmind)
Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Dynamic Evaluation
“Mogrifier LSTM”, Melis et al 2019
“Dynamic Evaluation of Transformer Language Models”, Krause et al 2019
“Dynamic Evaluation of Neural Sequence Models”, Krause et al 2017
“Bayesian Recurrent Neural Networks”, Fortunato et al 2017
“Learning Simpler Language Models With the Differential State Framework”, II et al 2017
Learning Simpler Language Models with the Differential State Framework
“Neural Episodic Control”, Pritzel et al 2017
“Multiplicative LSTM for Sequence Modelling”, Krause et al 2016
“Generating Sequences With Recurrent Neural Networks”, Graves 2013
“Recurrent Neural Network Based Language Model § Dynamic Evaluation”, Mikolov et al 2010 (page 2)
Recurrent Neural Network Based Language Model § Dynamic Evaluation
“Fast Text Compression With Neural Networks”, Mahoney 2000
Wikipedia
Miscellaneous
-
/doc/ai/nn/dynamic-evaluation/2023-hardt-figure7-bitesperbyteforgpt2large.png
: -
/doc/ai/nn/dynamic-evaluation/2023-hardt-figure8-bitesperbyteforgptneo.jpg
: -
https://benkrause.github.io/blog/human-level-text-prediction/
: -
https://www.latent.space/p/fastai#%C2%A7replacing-fine-tuning-with-continued-pre-training
:
Link Bibliography
-
https://arxiv.org/abs/2307.05014
: “Test-Time Training on Video Streams”, -
https://arxiv.org/abs/2305.18466
: “TTT-NN: Test-Time Training on Nearest Neighbors for Large Language Models”, -
https://arxiv.org/abs/2212.02475#google
: “FWL: Meta-Learning Fast Weight Language Models”, -
https://arxiv.org/abs/2112.08653
: “Reconsidering the Past: Optimizing Hidden States in Language Models”, -
https://arxiv.org/pdf/2102.01951.pdf#page=7&org=deepmind
: “Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Dynamic Evaluation”, -
https://arxiv.org/abs/1909.01792#deepmind
: “Mogrifier LSTM”, -
https://arxiv.org/abs/1904.08378
: “Dynamic Evaluation of Transformer Language Models”,