Meta-Learning: Learning to Learn Fast
Reptile/FOMAML: On First-Order Meta-Learning Algorithms
An Empirical Model of Large-Batch Training
AUNN: Simple Implementation of Gwern’s AUNN Proposal
One Big Net For Everything
CM3: A Causal Masked Multimodal Model of the Internet
SIREN: Implicit Neural Representations with Periodic Activation Functions
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeuralSVG: An Implicit Representation for Text-to-Vector Generation
Compressing multidimensional weather and climate data into neural networks
Image Generators with Conditionally-Independent Pixel Synthesis
Rethinking Patch Dependence for Masked Autoencoders
σ-GPTs: A New Approach to Autoregressive Models
Fourier Neural Operator for Parametric Partial Differential Equations
Neural Ordinary Differential Equations
Perceiver: General Perception with Iterative Attention
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Transformer Memory as a Differentiable Search Index
Large Language Models Struggle to Learn Long-Tail Knowledge
A Neural Corpus Indexer for Document Retrieval
PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications
Parallel WaveNet: Fast High-Fidelity Speech Synthesis
FloWaveNet: A Generative Flow for Raw Audio
Efficient Neural Audio Synthesis
Blockwise Parallel Decoding for Deep Autoregressive Models
Mask-Predict: Parallel Decoding of Conditional Masked Language Models
Insertion Transformer: Flexible Sequence Generation via Insertion Operations
Meta Reinforcement Learning
backstop#learning-backprop
[Transclude the forward-link's
context]
‘Decision Transformer’ directory
Gato: A Generalist Agent
Dynamic Evaluation of Transformer Language Models
‘MLP NN’ directory
index#convolution-learning
[Transclude the forward-link's
context]
Scaling MLPs: A Tale of Inductive Bias
Real-time Neural Radiance Caching for Path Tracing
Hopfield Networks is All You Need
Buried by the Ash of Vesuvius, These Scrolls Are Being Read for the First Time in Millennia: A Revolutionary American Scientist Is Using Subatomic Physics to Decipher 2,000-Year-Old Texts from the Early Days of Western Civilization
Vesuvius Challenge
https://x.com/CFGeek/status/1700317550859673996
GPT-3 Creative Fiction § Prompts As Programming
MAML: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention
Linear Transformers Are Secretly Fast Weight Programmers
HyperNetworks
Neural Turing Machines
MetaFun: Meta-Learning with Iterative Functional Updates
RoFormer: Enhanced Transformer with Rotary Position Embedding
Train Short, Test Long: Attention with Linear Biases (ALiBi) Enables Input Length Extrapolation
https://colab.research.google.com/github/murphyka/ml_colabs/blob/main/Simple_MLP_Visualization.ipynb
scaling-hypothesis#blessings-of-scale
[Transclude the forward-link's
context]
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
GANs Didn’t Fail, They Were Abandoned
https://x.com/stephenroller/status/1579993017234382849
Pay Attention to MLPs
MLP Architectures for Vision-and-Language Modeling: An Empirical Study
https://arxiv.org/pdf/2207.10551.pdf#page=7&org=google
Deep Differentiable Logic Gate Networks
Scaling Vision Transformers to 22 Billion Parameters
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Single Headed Attention RNN: Stop Thinking With Your Head
ALD: Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation
Finetuning Pretrained Transformers into RNNs
RWKV: Reinventing RNNs for the Transformer Era
Retentive Network: A Successor to Transformer for Large Language Models
index#transformer-rnn
[Transclude the forward-link's
context]
Computer Optimization: Your Computer Is Faster Than You Think § DL
[Transclude the forward-link's
context]
Efficient Transformers: A Survey
The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention
‘continual learning’ directory
Faster SGD training by minibatch persistency
Towards Scaling Difference Target Propagation by Learning Backprop Targets
Direct Feedback Alignment Provides Learning in Deep Neural Networks
Predictive Coding Can Do Exact Backpropagation on Any Neural Network
PES: Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies
Scaling Forward Gradient With Local Losses
Meta Learning Backpropagation And Improving It
design#future-tag-features
[Transclude the forward-link's
context]
sort#binsort
[Transclude the forward-link's
context]
MUX-PLMs: Pre-training Language Models with Data Multiplexing
Progressive Growing of GANs for Improved Quality, Stability, and Variation
‘knowledge distillation’ directory
Net2Net: Accelerating Learning via Knowledge Transfer
SGDR: Stochastic Gradient Descent with Warm Restarts
Active Learning Literature Survey
Bidirectional Learning for Robust Neural Networks
What Are Bayesian Neural Network Posteriors Really Like?
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
‘retrieval AI’ directory
A Neural Corpus Indexer for Document Retrieval
‘discrete diffusion model’ directory
Player of Games
https://github.com/tromp/ChessPositionRanking
ChessPositionRanking/img/2389704906374985477664262349386869232706664089.png at Main • Tromp/ChessPositionRanking
‘inner monologue (AI)’ directory
CausalLM is not optimal for in-context learning
The Unreasonable Effectiveness of Recurrent Neural Networks
Scaling Scaling Laws with Board Games
Scaling down Deep Learning
Transformer Language Models without Positional Encodings Still Learn Positional Information
RWKV-7 ‘Goose’ with Expressive Dynamic State Evolution
The Belief State Transformer
Do language models plan ahead for future tokens?
https://www.anthropic.com/research/tracing-thoughts-language-model
Hardware hedging scaling risks