Bibliography:

  1. Dynamic Evaluation

  2. ‘neural net’ tag

  3. ‘compressed Transformers’ tag

  4. ‘self-attention’ tag

  5. ‘continual learning’ tag

  6. ‘meta-learning’ tag

  7. Nenex: A Neural Personal Wiki Idea

  8. AUNN: Simple Implementation of Gwern’s AUNN Proposal

  9. Emergent properties with repeated examples

  10. Evaluating the fairness of task-adaptive pretraining on unlabeled test data before few-shot text classification

  11. Learning to (Learn at Test Time): RNNs with Expressive Hidden States

  12. Instruction Modeling: Instruction Tuning With Loss Over Instructions

  13. Test-Time Augmentation to solve ARC

  14. An accurate and rapidly calibrating speech neuroprosthesis

  15. Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models

  16. Neural Spline Fields for Burst Image Fusion and Layer Separation

  17. Test-time Adaptation of Discriminative Models via Diffusion Generative Feedback

  18. In-Context Pretraining (ICP): Language Modeling Beyond Document Boundaries

  19. OSD: Online Speculative Decoding

  20. Re-Reading Improves Reasoning in Large Language Models

  21. Test-Time Training on Video Streams

  22. TTT-NN: Test-Time Training on Nearest Neighbors for Large Language Models

  23. FWL: Meta-Learning Fast Weight Language Models

  24. Test-Time Training with Masked Autoencoders

  25. Don’t stop the training: continuously-updating self-supervised algorithms best account for auditory responses in the cortex

  26. Reconsidering the Past: Optimizing Hidden States in Language Models

  27. Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Scaling

  28. Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Dynamic Evaluation

  29. Test-Time Training with Self-Supervision for Generalization under Distribution Shifts

  30. Unsupervised Domain Adaptation through Self-Supervision

  31. Mogrifier LSTM

  32. Dynamic Evaluation of Transformer Language Models

  33. Learning and Evaluating General Linguistic Intelligence

  34. Faster SGD training by minibatch persistency

  35. Continuous Learning in a Hierarchical Multiscale Neural Network

  36. Dynamic Evaluation of Neural Sequence Models

  37. Bayesian Recurrent Neural Networks

  38. Learning Simpler Language Models with the Differential State Framework

  39. Neural Episodic Control

  40. Multiplicative LSTM for sequence modeling

  41. Generating Sequences With Recurrent Neural Networks

  42. Recurrent Neural Network Based Language Model § Dynamic Evaluation

  43. Fast Text Compression with Neural Networks

  44. OpenAI API § Prompt Caching

  45. Yu Sun

  46. design#future-tag-features

    [Transclude the forward-link's context]

  47. 2023-12-31-gwern-mentalgymnasticsmeme-dynamicevaluationvsalternativeapproaches.jpg

  48. 2023-hardt-figure5-bitsperbytegpt2performanceimprovementwhentrainingon50nearestneighborexamples.jpg

  49. 2023-hardt-figure6-perplexitiesdecreasewhentrainingonincreasinglymoreneighborsusinggpt2onthepile.jpg

  50. 2023-hardt-figure7-bitesperbyteforgpt2large.jpg

  51. 2023-hardt-figure8-bitesperbyteforgptneo.jpg

  52. 2023-hardt-figure9-trainingcostofdynamicevaluationonnearestneighborlookups.jpg

  53. 2022-clark-figure2-fwldynamicevaluationimprovesmostonrareorrepeatedtokens.jpg

  54. 2017-krause-figure2-dynamicevaluationrnnpredictionofwikipediaandspanishtextshowingtesttimeadaptation.png

  55. https://arxiv.org/pdf/2102.01951.pdf#page=18&=deepmind

  56. 2b42979ae6f5f70c53a29c1350667ad5d50118d4.pdf#page=18&=deepmind

  57. https://benkrause.github.io/blog/human-level-text-prediction/

  58. 75e685c40b779e00de31359adf6ddfa5012a7d32.html

  59. https://www.latent.space/p/fastai#%C2%A7replacing-fine-tuning-with-continued-pre-training

  60. 1b54d898ad24b02d0faeab34ab6e834adec475d7.html#%C2%A7replacing-fine-tuning-with-continued-pre-training

  61. Evaluating the fairness of task-adaptive pretraining on unlabeled test data before few-shot text classification

  62. https%253A%252F%252Farxiv.org%252Fabs%252F2410.00179.html

  63. Test-Time Augmentation to solve ARC

  64. https%253A%252F%252Flab42.global%252Fcommunity-interview-jack-cole%252F.html

  65. Re-Reading Improves Reasoning in Large Language Models

  66. https%253A%252F%252Farxiv.org%252Fabs%252F2309.06275.html

  67. Test-Time Training on Video Streams

  68. Yu Sun

  69. https%253A%252F%252Farxiv.org%252Fabs%252F2307.05014.html

  70. TTT-NN: Test-Time Training on Nearest Neighbors for Large Language Models

  71. Yu Sun

  72. https%253A%252F%252Farxiv.org%252Fabs%252F2305.18466.html

  73. FWL: Meta-Learning Fast Weight Language Models

  74. https%253A%252F%252Farxiv.org%252Fabs%252F2212.02475%2523google.html

  75. Reconsidering the Past: Optimizing Hidden States in Language Models

  76. https%253A%252F%252Farxiv.org%252Fabs%252F2112.08653.html

  77. Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Scaling

  78. https%253A%252F%252Farxiv.org%252Fabs%252F2102.01951%2523scaling%2526org%253Ddeepmind.html

  79. Mind the Gap: Assessing Temporal Generalization in Neural Language Models § Dynamic Evaluation

  80. https%253A%252F%252Farxiv.org%252Fpdf%252F2102.01951%2523page%253D7%2526org%253Ddeepmind.html

  81. Mogrifier LSTM

  82. https%253A%252F%252Farxiv.org%252Fabs%252F1909.01792%2523deepmind.html

  83. Dynamic Evaluation of Transformer Language Models

  84. https%253A%252F%252Farxiv.org%252Fabs%252F1904.08378.html

  85. Dynamic Evaluation of Neural Sequence Models

  86. https%253A%252F%252Farxiv.org%252Fabs%252F1709.07432.html