Bibliography (143):

  1. Meta-Learning: Learning to Learn Fast

  2. Reptile/FOMAML: On First-Order Meta-Learning Algorithms

  3. An Empirical Model of Large-Batch Training

  4. AUNN: Simple Implementation of Gwern’s AUNN Proposal

  5. One Big Net For Everything

  6. CM3: A Causal Masked Multimodal Model of the Internet

  7. SIREN: Implicit Neural Representations with Periodic Activation Functions

  8. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

  9. NeuralSVG: An Implicit Representation for Text-to-Vector Generation

  10. Compressing multidimensional weather and climate data into neural networks

  11. Image Generators with Conditionally-Independent Pixel Synthesis

  12. Rethinking Patch Dependence for Masked Autoencoders

  13. σ-GPTs: A New Approach to Autoregressive Models

  14. Fourier Neural Operator for Parametric Partial Differential Equations

  15. Neural Ordinary Differential Equations

  16. Perceiver: General Perception with Iterative Attention

  17. Perceiver IO: A General Architecture for Structured Inputs & Outputs

  18. Transformer Memory as a Differentiable Search Index

  19. Large Language Models Struggle to Learn Long-Tail Knowledge

  20. A Neural Corpus Indexer for Document Retrieval

  21. PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications

  22. Parallel WaveNet: Fast High-Fidelity Speech Synthesis

  23. FloWaveNet: A Generative Flow for Raw Audio

  24. Efficient Neural Audio Synthesis

  25. Blockwise Parallel Decoding for Deep Autoregressive Models

  26. Mask-Predict: Parallel Decoding of Conditional Masked Language Models

  27. Insertion Transformer: Flexible Sequence Generation via Insertion Operations

  28. Meta Reinforcement Learning

  29. backstop#learning-backprop

    [Transclude the forward-link's context]

  30. ‘Decision Transformer’ directory

  31. Gato: A Generalist Agent

  32. Dynamic Evaluation of Transformer Language Models

  33. ‘MLP NN’ directory

  34. index#convolution-learning

    [Transclude the forward-link's context]

  35. Scaling MLPs: A Tale of Inductive Bias

  36. Real-time Neural Radiance Caching for Path Tracing

  37. Hopfield Networks is All You Need

  38. Buried by the Ash of Vesuvius, These Scrolls Are Being Read for the First Time in Millennia: A Revolutionary American Scientist Is Using Subatomic Physics to Decipher 2,000-Year-Old Texts from the Early Days of Western Civilization

  39. Vesuvius Challenge

  40. https://x.com/CFGeek/status/1700317550859673996

  41. GPT-3 Creative Fiction § Prompts As Programming

  42. MAML: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

  43. Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers

  44. One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention

  45. Linear Transformers Are Secretly Fast Weight Programmers

  46. HyperNetworks

  47. Neural Turing Machines

  48. MetaFun: Meta-Learning with Iterative Functional Updates

  49. RoFormer: Enhanced Transformer with Rotary Position Embedding

  50. Train Short, Test Long: Attention with Linear Biases (ALiBi) Enables Input Length Extrapolation

  51. https://colab.research.google.com/github/murphyka/ml_colabs/blob/main/Simple_MLP_Visualization.ipynb

  52. scaling-hypothesis#blessings-of-scale

    [Transclude the forward-link's context]

  53. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

  54. MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

  55. GANs Didn’t Fail, They Were Abandoned

  56. https://x.com/stephenroller/status/1579993017234382849

  57. Pay Attention to MLPs

  58. MLP Architectures for Vision-and-Language Modeling: An Empirical Study

  59. https://arxiv.org/pdf/2207.10551.pdf#page=7&org=google

  60. Deep Differentiable Logic Gate Networks

  61. Scaling Vision Transformers to 22 Billion Parameters

  62. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

  63. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

  64. Single Headed Attention RNN: Stop Thinking With Your Head

  65. ALD: Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation

  66. Finetuning Pretrained Transformers into RNNs

  67. RWKV: Reinventing RNNs for the Transformer Era

  68. Retentive Network: A Successor to Transformer for Large Language Models

  69. index#transformer-rnn

    [Transclude the forward-link's context]

  70. Computer Optimization: Your Computer Is Faster Than You Think § DL

    [Transclude the forward-link's context]

  71. Efficient Transformers: A Survey

  72. The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention

  73. ‘continual learning’ directory

  74. Faster SGD training by minibatch persistency

  75. Towards Scaling Difference Target Propagation by Learning Backprop Targets

  76. Direct Feedback Alignment Provides Learning in Deep Neural Networks

  77. Predictive Coding Can Do Exact Backpropagation on Any Neural Network

  78. PES: Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies

  79. Scaling Forward Gradient With Local Losses

  80. Meta Learning Backpropagation And Improving It

  81. design#future-tag-features

    [Transclude the forward-link's context]

  82. sort#binsort

    [Transclude the forward-link's context]

  83. MUX-PLMs: Pre-training Language Models with Data Multiplexing

  84. Progressive Growing of GANs for Improved Quality, Stability, and Variation

  85. ‘knowledge distillation’ directory

  86. Net2Net: Accelerating Learning via Knowledge Transfer

  87. SGDR: Stochastic Gradient Descent with Warm Restarts

  88. Active Learning Literature Survey

  89. Bidirectional Learning for Robust Neural Networks

  90. What Are Bayesian Neural Network Posteriors Really Like?

  91. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

  92. ‘retrieval AI’ directory

  93. A Neural Corpus Indexer for Document Retrieval

  94. ‘discrete diffusion model’ directory

  95. Player of Games

  96. https://github.com/tromp/ChessPositionRanking

  97. ChessPositionRanking/img/2389704906374985477664262349386869232706664089.png at Main Tromp/ChessPositionRanking

  98. ‘inner monologue (AI)’ directory

  99. CausalLM is not optimal for in-context learning

  100. The Unreasonable Effectiveness of Recurrent Neural Networks

  101. Scaling Scaling Laws with Board Games

  102. Scaling down Deep Learning

  103. Transformer Language Models without Positional Encodings Still Learn Positional Information

  104. RWKV-7 ‘Goose’ with Expressive Dynamic State Evolution

  105. The Belief State Transformer

  106. Do language models plan ahead for future tokens?

  107. https://www.anthropic.com/research/tracing-thoughts-language-model

  108. Hardware hedging scaling risks