Hymba: A Hybrid-head Architecture for Small Language Models
State-space models can learn in-context by gradient descent
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
State Soup: In-Context Skill Learning, Retrieval and Mixing
Grokfast: Accelerated Grokking by Amplifying Slow Gradients
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
An accurate and rapidly calibrating speech neuroprosthesis
RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Zoology: Measuring and Improving Recall in Efficient Language Models
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
HGRN: Hierarchically Gated Recurrent Neural Network for Sequence Modeling
On prefrontal working memory and hippocampal episodic memory: Unifying memories stored in weights and activation slots
GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling
ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models
Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models
Generalization in Sensorimotor Networks Configured with Natural Language Instructions
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
Parallelizing non-linear sequential models over the sequence length
A high-performance neuroprosthesis for speech decoding and avatar control
Retentive Network: A Successor to Transformer for Large Language Models
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
Emergence of belief-like representations through reinforcement learning
Model scale versus domain knowledge in statistical forecasting of chaotic systems
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
Organic reaction mechanism classification using machine learning
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference
Legged Locomotion in Challenging Terrains using Egocentric Vision
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities
Perfectly Secure Steganography Using Minimum Entropy Coupling
Semantic scene descriptions as an objective of human vision
Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter
PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations
Spatial representation by ramping activity of neurons in the retrohippocampal cortex
AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos
Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline (3RL)
Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers
Semantic projection recovers rich human knowledge of multiple object features from word embeddings
All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL
General-purpose, long-context autoregressive modeling with Perceiver AR
End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking
Data Scaling Laws in NMT: The Effect of Noise and Architecture
Active Predictive Coding Networks: A Neural Solution to the Problem of Learning Reference Frames and Part-Whole Hierarchies
Learning robust perceptive locomotion for quadrupedal robots in the wild
Inducing Causal Structure for Interpretable Neural Networks (IIT)
Evaluating Distributional Distortion in Neural Language Modeling
An Explanation of In-context Learning as Implicit Bayesian Inference
S4: Efficiently Modeling Long Sequences with Structured State Spaces
LSSL: Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers
A connectome of the Drosophila central complex reveals network motifs suitable for flexible navigation and context-dependent action selection
Recurrent Model-Free RL is a Strong Baseline for Many POMDPs
Photos Are All You Need for Reciprocal Recommendation in Online Dating
Perceiver IO: A General Architecture for Structured Inputs & Outputs
PES: Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies
Scaling End-to-End Models for Large-Scale Multilingual ASR
Sensitivity as a Complexity Measure for Sequence Classification Tasks
ALD: Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation
When Attention Meets Fast Recurrence: Training SRU++ Language Models with Reduced Compute
Generative Speech Coding with Predictive Variance Regularization
Predictive coding is a consequence of energy efficiency in recurrent neural networks
Distilling Large Language Models into Tiny and Effective Students using pQRNN
Towards Playing Full MOBA Games with Deep Reinforcement Learning
Multimodal dynamics modeling for off-road autonomous vehicles
Learning to Summarize Long Texts with Memory Compression and Transfer
Human-centric Dialog Training via Offline Reinforcement Learning
Deep Reinforcement Learning for Closed-Loop Blood Glucose Control
HiPPO: Recurrent Memory with Optimal Polynomial Projections
Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size
Matt Botvinick on the spontaneous emergence of learning algorithms
Cultural influences on word meanings revealed through large-scale semantic alignment
DeepSinger: Singing Voice Synthesis with Data Mined From the Web
High-performance brain-to-text communication via imagined handwriting
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Untangling tradeoffs between recurrence and self-attention in neural networks
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models
Machine Translation of Cortical Activity to Text With an Encoder-Decoder Framework
Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving
Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks
Estimating the deep replicability of scientific findings using human and artificial intelligence
Placing language in an integrated understanding system: Next steps toward human-level performance in neural language models
Measuring Compositional Generalization: A Comprehensive Method on Realistic Data
SimpleBooks: Long-term dependency book dataset with simplified English vocabulary for word-level language modeling
MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning
High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks
Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks
SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference
Restoring ancient text using deep learning (Pythia): a case study on Greek epigraphy
R2D3: Making Efficient Use of Demonstrations to Solve Hard Exploration Problems
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP
MoGlow: Probabilistic and controllable motion synthesis using normalizing flows
Good News, Everyone! Context driven entity-aware captioning for news images
On the Turing Completeness of Modern Neural Network Architectures
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Natural Questions: A Benchmark for Question Answering Research
High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks: Videos
R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Adversarial Reprogramming of Text Classification Neural Networks
This Time with Feeling: Learning Expressive Musical Performance
Character-Level Language Modeling with Deeper Self-Attention
Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme
Accurate Uncertainties for Deep Learning Using Calibrated Regression
The Natural Language Decathlon: Multitask Learning as Question Answering
Know What You Don’t Know: Unanswerable Questions for SQuAD
Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data
Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context
Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies
An Analysis of Neural Language Modeling at Multiple Scales
Learning Longer-term Dependencies in RNNs with Auxiliary Losses
M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search
Overcoming the vanishing gradient problem in plain recurrent networks
ULMFiT: Universal Language Model Fine-tuning for Text Classification
Large-scale comparison of machine learning methods for drug target prediction on ChEMBL
A Flexible Approach to Automated RNN Architecture Generation
Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition
Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
Unsupervised Machine Translation Using Monolingual Corpora Only
Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks
To prune, or not to prune: exploring the efficacy of pruning for model compression
Why Pay More When You Can Pay Less: A Joint Learning Framework for Active Feature Acquisition and Classification
N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning
SRU: Simple Recurrent Units for Highly Parallelizable Recurrence
Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks
Twin Networks: Matching the Future for Sequence Generation
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
On the State-of-the-Art of Evaluation in Neural Language Models
Controlling Linguistic Style Aspects in Neural Language Generation
Towards Synthesizing Complex Programs from Input-Output Examples
Language Generation with Recurrent Generative Adversarial Networks without Pre-training
Biased Importance Sampling for Deep Neural Network Training
Deriving Neural Architectures from Sequence and Graph Kernels
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
A neural network system for transformation of regional cuisine style
Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU
SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine
Learning to Reason: End-to-End Module Networks for Visual Question Answering
Get To The Point: Summarization with Pointer-Generator Networks
DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks
Learning Simpler Language Models with the Differential State Framework
I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation
Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets
Neural Combinatorial Optimization with Reinforcement Learning
Frustratingly Short Attention Spans in Neural Language Modeling
Tuning Recurrent Neural Networks with Reinforcement Learning
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Neural Data Filter for Bootstrapping Stochastic Gradient Descent
Your TL;DR by an AI: A Deep Reinforced Model for Abstractive Summarization
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
Neural Combinatorial Optimization with Reinforcement Learning
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
Learning to Learn without Gradient Descent by Gradient Descent
RL2: Fast Reinforcement Learning via Slow Reinforcement Learning
Hybrid computing using a neural network with dynamic external memory
Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes
Achieving Human Parity in Conversational Speech Recognition
Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Deep Learning Human Mind for Automated Visual Classification
Full Resolution Image Compression with Recurrent Neural Networks
LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks
Iterative Alternating Neural Attention for Machine Reading
Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex
Improving sentence compression by learning to predict gaze
Dynamic Memory Networks for Visual and Textual Question Answering
PlaNet—Photo Geolocation with Convolutional Neural Networks
Learning Distributed Representations of Sentences from Unlabeled Data
Exploring the Limits of Language Modeling § 5.9: Samples from the Model
ec477c75170386e6fd6aff677d064cf95eb87a20.pdf#page=8&org=google
On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models
Generative Concatenative Nets Jointly Learn to Write and Classify Reviews
BPEs: Neural Machine Translation of Rare Words with Subword Units
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
The Unreasonable Effectiveness of Recurrent Neural Networks
Deep Neural Networks for Large Vocabulary Handwritten Text Recognition
Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets
Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews
Neural Machine Translation by Jointly Learning to Align and Translate
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
GRU: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
doc2vec: Distributed Representations of Sentences and Documents
One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling
Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity
A Focused Backpropagation Algorithm for Temporal Pattern Recognition
Learning Complex, Extended Sequences Using the Principle of History Compression
Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks
Untersuchungen zu dynamischen neuronalen Netzen [Studies of dynamic neural networks]
Connectionist Music Composition Based on Melodic, Stylistic, and Psychophysical Constraints [Technical report CU-CS–495–90]
A Learning Algorithm for Continually Running Fully Recurrent Neural Networks
Experimental Analysis of the Real-time Recurrent Learning Algorithm
A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks
Generalization of backpropagation with application to a recurrent gas market model
Generalization of back-propagation to recurrent neural networks
The Utility Driven Dynamic Error Propagation Network (RTRL)
A self-optimizing, non-symmetrical neural net for content addressable memory and pattern recognition
Programming a massively parallel, computation universal system: Static behavior
Safety-First AI for Autonomous Data Center Cooling and Industrial Control
BlinkDL/RWKV-LM: RWKV Is an RNN With Transformer-Level LLM Performance. It Can Be Directly Trained like a GPT (parallelizable). So It’s Combining the Best of RNN and Transformer—Great Performance, Fast Inference, Saves VRAM, Fast Training, "Infinite" Ctx_len, and Free Sentence Embedding.
Minimaxir/textgenrnn: Easily Train Your Own Text-Generating Neural Network of Any Size and Complexity on Any Text Dataset With a Few Lines of Code.
Deep Learning for Assisting the Process of Music Composition (part 3)
2020-deepmind-agent57-figure3-deepreinforcementlearningtimeline.svg
2017-khalifa-example3-incoherentdeeptinglesamplepromptedwithmobydickcallmeishmael.png
2017-krause-figure2-dynamicevaluationrnnpredictionofwikipediaandspanishtextshowingtesttimeadaptation.png
https://ahrm.github.io/jekyll/update/2022/04/14/using-languge-models-to-read-faster.html
https://blog.rwkv.com/p/eagle-7b-soaring-past-transformers
https://cprimozic.net/blog/growing-sparse-computational-graphs-with-rnns/
https://homepages.inf.ed.ac.uk/abmayne/publications/sennrich2016NAACL.pdf
https://magenta.tensorflow.org/blog/2017/06/01/waybackprop
https://mi.eng.cam.ac.uk/projects/cued-rnnlm/papers/Interspeech15.pdf
https://patentimages.storage.googleapis.com/57/53/22/91b8a6792dbb1e/US20180204116A1.pdf#deepmind
https://wandb.ai/wandb_fc/articles/reports/Image-to-LaTeX--Vmlldzo1NDQ0MTAx
https://www.cs.utoronto.ca/~ilya/pubs/ilya_sutskever_phd_thesis.pdf#page=67
https://www.lesswrong.com/posts/bD4B2MF7nsGAfH9fj/basic-mathematics-of-predictive-coding
https://www.lesswrong.com/posts/mxa7XZ8ajE2oarWcr/lawrencec-s-shortform#pEqfzPMpqsnhaGrNK
https://www.reddit.com/r/MachineLearning/comments/11nre6t/p_rwkv_14b_is_a_strong_chatbot_despite_only/
https://www.reddit.com/r/MachineLearning/comments/umq908/r_rwkvv2rnn_a_parallelizable_rnn_with/
https://www.reddit.com/r/MachineLearning/comments/yxt8sa/r_rwkv4_7b_release_an_attentionfree_rnn_language/
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Grokfast: Accelerated Grokking by Amplifying Slow Gradients
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
https%253A%252F%252Farxiv.org%252Fabs%252F2404.08801%2523facebook.html
https%253A%252F%252Farxiv.org%252Fabs%252F2404.05971%2523eleutherai.html
Zoology: Measuring and Improving Recall in Efficient Language Models
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
https%253A%252F%252Fwww.nature.com%252Farticles%252Fs41467-023-42875-2%2523deepmind.html
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
Jeff Clune—Professor—Computer Science—University of British Columbia
https%253A%252F%252Farxiv.org%252Fabs%252F2303.06349%2523deepmind.html
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
Organic reaction mechanism classification using machine learning
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Legged Locomotion in Challenging Terrains using Egocentric Vision
Semantic scene descriptions as an objective of human vision
Semantic projection recovers rich human knowledge of multiple object features from word embeddings
General-purpose, long-context autoregressive modeling with Perceiver AR
https%253A%252F%252Farxiv.org%252Fabs%252F2202.07765%2523deepmind.html
Learning robust perceptive locomotion for quadrupedal robots in the wild
%252Fdoc%252Freinforcement-learning%252Fmeta-learning%252F2022-miki.pdf.html
S4: Efficiently Modeling Long Sequences with Structured State Spaces
A connectome of the Drosophila central complex reveals network motifs suitable for flexible navigation and context-dependent action selection
https%253A%252F%252Felifesciences.org%252Farticles%252F66039.html
Perceiver IO: A General Architecture for Structured Inputs & Outputs
https%253A%252F%252Farxiv.org%252Fabs%252F2107.14795%2523deepmind.html
PES: Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies
https%253A%252F%252Fproceedings.mlr.press%252Fv139%252Fvicol21a.html.html
%252Fdoc%252Fai%252Fscaling%252Fhardware%252F2021-jouppi.pdf.html
https%253A%252F%252Farxiv.org%252Fabs%252F2106.09488%2523amazon.html
https%253A%252F%252Farxiv.org%252Fabs%252F2103.03206%2523deepmind.html
Towards Playing Full MOBA Games with Deep Reinforcement Learning
https%253A%252F%252Farxiv.org%252Fabs%252F2011.12692%2523tencent.html
HiPPO: Recurrent Memory with Optimal Polynomial Projections
Matt Botvinick on the spontaneous emergence of learning algorithms
https%253A%252F%252Fwww.lesswrong.com%252Fposts%252FWnqua6eQkewL3bqsF%252Fmatt-botvinick-on-the-spontaneous-emergence-of-learning.html
https%253A%252F%252Fdeepmind.google%252Fdiscover%252Fblog%252Fagent57-outperforming-the-human-atari-benchmark%252F.html
Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving
https%253A%252F%252Farxiv.org%252Fabs%252F2001.08361%2523openai.html
Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks
https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DHyxlRHBlUB.html
SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference
https%253A%252F%252Farxiv.org%252Fabs%252F1910.06591%2523deepmind.html
https%253A%252F%252Farxiv.org%252Fabs%252F1909.01792%2523deepmind.html
https%253A%252F%252Fpaperswithcode.com%252Ftask%252Flanguage-modelling.html
https%253A%252F%252Farxiv.org%252Fabs%252F1905.01320%2523deepmind.html
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning
https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253Dr1lyTjAqYX%2523deepmind.html
ULMFiT: Universal Language Model Fine-tuning for Text Classification
SRU: Simple Recurrent Units for Highly Parallelizable Recurrence
SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine
Learning to Reason: End-to-End Module Networks for Visual Question Answering
Untersuchungen zu dynamischen neuronalen Netzen [Studies of dynamic neural networks]
%252Fdoc%252Fai%252Fnn%252Frnn%252F1991-hochreiter.pdf.html
Experimental Analysis of the Real-time Recurrent Learning Algorithm
Generalization of backpropagation with application to a recurrent gas market model
Generalization of back-propagation to recurrent neural networks
Wikipedia Bibliography: