Bibliography:

  1. ‘neural net’ tag

  2. ‘MLP NN’ tag

  3. ‘compressed Transformers’ tag

  4. ‘recurrent Transformers’ tag

  5. ‘Transformer’ tag

  6. ‘MuZero’ tag

  7. Absolute Unit NNs: Regression-Based MLPs for Everything

  8. RNN Metadata for Mimicking Author Style

  9. FlashRNN: Optimizing Traditional RNNs on Modern Hardware

  10. Hymba: A Hybrid-head Architecture for Small Language Models

  11. State-space models can learn in-context by gradient descent

  12. Were RNNs All We Needed?

  13. The Mamba in the Llama: Distilling and Accelerating Hybrid Models

  14. handwriter.ttf: Handwriting Synthesis With Harfbuzz WASM

  15. Learning to (Learn at Test Time): RNNs with Expressive Hidden States

  16. An Empirical Study of Mamba-based Language Models

  17. State Soup: In-Context Skill Learning, Retrieval and Mixing

  18. Grokfast: Accelerated Grokking by Amplifying Slow Gradients

  19. Attention as an RNN

  20. xLSTM: Extended Long Short-Term Memory

  21. Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

  22. The Illusion of State in State-Space Models

  23. An accurate and rapidly calibrating speech neuroprosthesis

  24. Does Transformer Interpretability Transfer to RNNs?

  25. Mechanistic Design and Scaling of Hybrid Architectures

  26. GLE: Backpropagation through space, time, and the brain

  27. ZigMa: Zigzag Mamba Diffusion Model

  28. RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval

  29. MambaByte: Token-free Selective State Space Model

  30. MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

  31. Evolving Reservoirs for Meta Reinforcement Learning

  32. Zoology: Measuring and Improving Recall in Efficient Language Models

  33. Mamba: Linear-Time Sequence Modeling with Selective State Spaces

  34. Diffusion Models Without Attention

  35. Learning few-shot imitation as cultural transmission

  36. Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks

  37. HGRN: Hierarchically Gated Recurrent Neural Network for Sequence Modeling

  38. On prefrontal working memory and hippocampal episodic memory: Unifying memories stored in weights and activation slots

  39. GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling

  40. ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models

  41. Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models

  42. Generalization in Sensorimotor Networks Configured with Natural Language Instructions

  43. Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors

  44. Parallelizing non-linear sequential models over the sequence length

  45. A high-performance neuroprosthesis for speech decoding and avatar control

  46. Learning to Model the World with Language

  47. Retentive Network: A Successor to Transformer for Large Language Models

  48. Using Sequences of Life-events to Predict Human Lives

  49. Thought Cloning: Learning to Think while Acting by Imitating Human Thinking

  50. RWKV: Reinventing RNNs for the Transformer Era

  51. Emergence of belief-like representations through reinforcement learning

  52. Model scale versus domain knowledge in statistical forecasting of chaotic systems

  53. Resurrecting Recurrent Neural Networks for Long Sequences

  54. SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

  55. Organic reaction mechanism classification using machine learning

  56. A high-performance speech neuroprosthesis

  57. Hungry Hungry Hippos: Towards Language Modeling with State Space Models

  58. Pretraining Without Attention

  59. A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference

  60. Melting Pot 2.0

  61. VeLO: Training Versatile Learned Optimizers by Scaling Up

  62. Legged Locomotion in Challenging Terrains using Egocentric Vision

  63. Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities

  64. Perfectly Secure Steganography Using Minimum Entropy Coupling

  65. Transformers Learn Shortcuts to Automata

  66. Omnigrok: Grokking Beyond Algorithmic Data

  67. Semantic scene descriptions as an objective of human vision

  68. Benchmarking Compositionality with Formal Languages

  69. Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter

  70. PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations

  71. Spatial representation by ramping activity of neurons in the retrohippocampal cortex

  72. Neural Networks and the Chomsky Hierarchy

  73. BYOL-Explore: Exploration by Bootstrapped Prediction

  74. AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos

  75. Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline (3RL)

  76. Simple Recurrence Improves Masked Language Models

  77. Sequencer: Deep LSTM for Image Classification

  78. Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers

  79. Semantic projection recovers rich human knowledge of multiple object features from word embeddings

  80. Block-Recurrent Transformers

  81. All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL

  82. Retrieval-Augmented Reinforcement Learning

  83. Learning by Directional Gradient Descent

  84. General-purpose, long-context autoregressive modeling with Perceiver AR

  85. End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking

  86. Data Scaling Laws in NMT: The Effect of Noise and Architecture

  87. Active Predictive Coding Networks: A Neural Solution to the Problem of Learning Reference Frames and Part-Whole Hierarchies

  88. Learning robust perceptive locomotion for quadrupedal robots in the wild

  89. Inducing Causal Structure for Interpretable Neural Networks (IIT)

  90. Evaluating Distributional Distortion in Neural Language Modeling

  91. Gradients are Not All You Need

  92. An Explanation of In-context Learning as Implicit Bayesian Inference

  93. S4: Efficiently Modeling Long Sequences with Structured State Spaces

  94. Minimum Description Length Recurrent Neural Networks

  95. LSSL: Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers

  96. A connectome of the Drosophila central complex reveals network motifs suitable for flexible navigation and context-dependent action selection

  97. Recurrent Model-Free RL is a Strong Baseline for Many POMDPs

  98. Photos Are All You Need for Reciprocal Recommendation in Online Dating

  99. Perceiver IO: A General Architecture for Structured Inputs & Outputs

  100. PES: Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies

  101. Shelley: A Crowd-sourced Collaborative Horror Writer

  102. Ten Lessons From Three Generations Shaped Google’s TPUv4i

  103. RASP: Thinking Like Transformers

  104. Scaling Laws for Acoustic Models

  105. Scaling End-to-End Models for Large-Scale Multilingual ASR

  106. Sensitivity as a Complexity Measure for Sequence Classification Tasks

  107. ALD: Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation

  108. Finetuning Pretrained Transformers into RNNs

  109. Pretrained Transformers as Universal Computation Engines

  110. Perceiver: General Perception with Iterative Attention

  111. When Attention Meets Fast Recurrence: Training SRU++ Language Models with Reduced Compute

  112. Generative Speech Coding with Predictive Variance Regularization

  113. Predictive coding is a consequence of energy efficiency in recurrent neural networks

  114. Deep Residual Learning in Spiking Neural Networks

  115. Distilling Large Language Models into Tiny and Effective Students using pQRNN

  116. Meta Learning Backpropagation And Improving It

  117. On the Binding Problem in Artificial Neural Networks

  118. A Recurrent Vision-and-Language BERT for Navigation

  119. Towards Playing Full MOBA Games with Deep Reinforcement Learning

  120. Multimodal dynamics modeling for off-road autonomous vehicles

  121. Adversarial vulnerabilities of human decision-making

  122. Learning to Summarize Long Texts with Memory Compression and Transfer

  123. Human-centric Dialog Training via Offline Reinforcement Learning

  124. AFT: An Attention Free Transformer

  125. Deep Reinforcement Learning for Closed-Loop Blood Glucose Control

  126. HiPPO: Recurrent Memory with Optimal Polynomial Projections

  127. Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size

  128. Matt Botvinick on the spontaneous emergence of learning algorithms

  129. Cultural influences on word meanings revealed through large-scale semantic alignment

  130. DeepSinger: Singing Voice Synthesis with Data Mined From the Web

  131. High-performance brain-to-text communication via imagined handwriting

  132. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

  133. The Recurrent Neural Tangent Kernel

  134. Untangling tradeoffs between recurrence and self-attention in neural networks

  135. Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

  136. Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models

  137. Syntactic Structure from Deep Learning

  138. Agent57: Outperforming the human Atari benchmark

  139. Machine Translation of Cortical Activity to Text With an Encoder-Decoder Framework

  140. Learning-based Memory Allocation for C++ Server Workloads

  141. Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving

  142. Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks

  143. Scaling Laws for Neural Language Models

  144. Estimating the deep replicability of scientific findings using human and artificial intelligence

  145. Placing language in an integrated understanding system: Next steps toward human-level performance in neural language models

  146. Measuring Compositional Generalization: A Comprehensive Method on Realistic Data

  147. SimpleBooks: Long-term dependency book dataset with simplified English vocabulary for word-level language modeling

  148. Single Headed Attention RNN: Stop Thinking With Your Head

  149. Excavate

  150. MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

  151. CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning

  152. High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks

  153. Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks

  154. SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference

  155. Mixed-Signal Neuromorphic Processors: Quo vadis?

  156. Restoring ancient text using deep learning (Pythia): a case study on Greek epigraphy

  157. Mogrifier LSTM

  158. R2D3: Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

  159. Language Modeling State-of-the-art leaderboards

  160. Metalearned Neural Memory

  161. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

  162. Generating Text with Recurrent Neural Networks

  163. XLNet: Generalized Autoregressive Pretraining for Language Understanding

  164. Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP

  165. MoGlow: Probabilistic and controllable motion synthesis using normalizing flows

  166. Reinforcement Learning, Fast and Slow

  167. Meta-learners’ learning dynamics are unlike learners’

  168. Speech synthesis from neural decoding of spoken sentences

  169. Good News, Everyone! Context driven entity-aware captioning for news images

  170. Surrogate Gradient Learning in Spiking Neural Networks

  171. On the Turing Completeness of Modern Neural Network Architectures

  172. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

  173. Natural Questions: A Benchmark for Question Answering Research

  174. High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks: Videos

  175. Bayesian Layers: A Module for Neural Network Uncertainty

  176. Meta-Learning: Learning to Learn Fast

  177. Piano Genie

  178. Learning Recurrent Binary/Ternary Weights

  179. R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning

  180. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

  181. Adversarial Reprogramming of Text Classification Neural Networks

  182. Object Hallucination in Image Captioning

  183. This Time with Feeling: Learning Expressive Musical Performance

  184. Character-Level Language Modeling with Deeper Self-Attention

  185. General Value Function Networks

  186. Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme

  187. Universal Transformers

  188. Accurate Uncertainties for Deep Learning Using Calibrated Regression

  189. The Natural Language Decathlon: Multitask Learning as Question Answering

  190. Neural Ordinary Differential Equations

  191. Know What You Don’t Know: Unanswerable Questions for SQuAD

  192. DVRL: Deep Variational Reinforcement Learning for POMDPs

  193. Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data

  194. Hierarchical Neural Story Generation

  195. Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context

  196. Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies

  197. A Tree Search Algorithm for Sequence Labeling

  198. An Analysis of Neural Language Modeling at Multiple Scales

  199. Reviving and Improving Recurrent Back-Propagation

  200. Learning Memory Access Patterns

  201. Learning Longer-term Dependencies in RNNs with Auxiliary Losses

  202. One Big Net For Everything

  203. Efficient Neural Audio Synthesis

  204. Deep contextualized word representations

  205. M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search

  206. Overcoming the vanishing gradient problem in plain recurrent networks

  207. ULMFiT: Universal Language Model Fine-tuning for Text Classification

  208. Large-scale comparison of machine learning methods for drug target prediction on ChEMBL

  209. A Flexible Approach to Automated RNN Architecture Generation

  210. The NarrativeQA Reading Comprehension Challenge

  211. Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition

  212. Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

  213. Evaluating prose style transfer with the Bible

  214. Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

  215. Neural Speed Reading via Skim-RNN

  216. Unsupervised Machine Translation Using Monolingual Corpora Only

  217. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks

  218. Mixed Precision Training

  219. To prune, or not to prune: exploring the efficacy of pruning for model compression

  220. Dynamic Evaluation of Neural Sequence Models

  221. Online Learning of a Memory for Learning Rates

  222. Why Pay More When You Can Pay Less: A Joint Learning Framework for Active Feature Acquisition and Classification

  223. N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning

  224. SRU: Simple Recurrent Units for Highly Parallelizable Recurrence

  225. Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks

  226. Twin Networks: Matching the Future for Sequence Generation

  227. Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

  228. Revisiting Activation Regularization for Language RNNs

  229. Bayesian Sparsification of Recurrent Neural Networks

  230. On the State-of-the-Art of Evaluation in Neural Language Models

  231. Controlling Linguistic Style Aspects in Neural Language Generation

  232. Device Placement Optimization with Reinforcement Learning

  233. Six Challenges for Neural Machine Translation

  234. Towards Synthesizing Complex Programs from Input-Output Examples

  235. Language Generation with Recurrent Generative Adversarial Networks without Pre-training

  236. Biased Importance Sampling for Deep Neural Network Training

  237. Deriving Neural Architectures from Sequence and Graph Kernels

  238. A Deep Reinforced Model for Abstractive Summarization

  239. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

  240. DeepTingle

  241. A neural network system for transformation of regional cuisine style

  242. Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU

  243. Adversarial Neural Machine Translation

  244. SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine

  245. Learning to Reason: End-to-End Module Networks for Visual Question Answering

  246. Exploring Sparsity in Recurrent Neural Networks

  247. Get To The Point: Summarization with Pointer-Generator Networks

  248. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks

  249. Bayesian Recurrent Neural Networks

  250. Recurrent Environment Simulators

  251. Learning to Generate Reviews and Discovering Sentiment

  252. Learning Simpler Language Models with the Differential State Framework

  253. I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation

  254. Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets

  255. Learned Optimizers that Scale and Generalize

  256. Parallel Multiscale Autoregressive Density Estimation

  257. Tracking the World State with Recurrent Entity Networks

  258. Optimization as a Model for Few-Shot Learning

  259. Neural Combinatorial Optimization with Reinforcement Learning

  260. Frustratingly Short Attention Spans in Neural Language Modeling

  261. Tuning Recurrent Neural Networks with Reinforcement Learning

  262. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

  263. Neural Data Filter for Bootstrapping Stochastic Gradient Descent

  264. Learning the Enigma With Recurrent Neural Networks

  265. Your TL;DR by an AI: A Deep Reinforced Model for Abstractive Summarization

  266. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

  267. Improving Neural Language Models with a Continuous Cache

  268. NewsQA: A Machine Comprehension Dataset

  269. Neural Combinatorial Optimization with Reinforcement Learning

  270. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

  271. Learning to Learn without Gradient Descent by Gradient Descent

  272. RL2: Fast Reinforcement Learning via Slow Reinforcement Learning

  273. DeepCoder: Learning to Write Programs

  274. QRNNs: Quasi-Recurrent Neural Networks

  275. Neural Architecture Search with Reinforcement Learning

  276. Bidirectional Attention Flow for Machine Comprehension

  277. Hybrid computing using a neural network with dynamic external memory

  278. Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes

  279. Using Fast Weights to Attend to the Recent Past

  280. Achieving Human Parity in Conversational Speech Recognition

  281. VPN: Video Pixel Networks

  282. HyperNetworks

  283. Pointer Sentinel Mixture Models

  284. Multiplicative LSTM for sequence modeling

  285. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

  286. Image-to-Markup Generation with Coarse-to-Fine Attention

  287. Hierarchical Multiscale Recurrent Neural Networks

  288. Deep Learning Human Mind for Automated Visual Classification

  289. Using the Output Embedding to Improve Language Models

  290. Full Resolution Image Compression with Recurrent Neural Networks

  291. Decoupled Neural Interfaces using Synthetic Gradients

  292. Clockwork Convnets for Video Semantic Segmentation

  293. Layer Normalization

  294. LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks

  295. Learning to learn by gradient descent by gradient descent

  296. Iterative Alternating Neural Attention for Machine Reading

  297. Deep Reinforcement Learning for Dialogue Generation

  298. Programming with a Differentiable Forth Interpreter

  299. Training Deep Nets with Sublinear Memory Cost

  300. Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex

  301. Improving sentence compression by learning to predict gaze

  302. Adaptive Computation Time for Recurrent Neural Networks

  303. Dynamic Memory Networks for Visual and Textual Question Answering

  304. PlaNet—Photo Geolocation with Convolutional Neural Networks

  305. Learning Distributed Representations of Sentences from Unlabeled Data

  306. Exploring the Limits of Language Modeling

  307. PixelRNN: Pixel Recurrent Neural Networks

  308. Persistent RNNs: Stashing Recurrent Weights On-Chip

  309. Exploring the Limits of Language Modeling § 5.9: Samples from the Model

  310. ec477c75170386e6fd6aff677d064cf95eb87a20.pdf#page=8&org=google

  311. Deep-Spying: Spying using Smartwatch and Deep Learning

  312. On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models

  313. Neural GPUs Learn Algorithms

  314. Sequence Level Training with Recurrent Neural Networks

  315. Neural Programmer-Interpreters

  316. Generating Sentences from a Continuous Space

  317. Generative Concatenative Nets Jointly Learn to Write and Classify Reviews

  318. Generating Images from Captions with Attention

  319. Semi-supervised Sequence Learning

  320. BPEs: Neural Machine Translation of Rare Words with Subword Units

  321. Training recurrent networks online without backtracking

  322. Deep Recurrent Q-Learning for Partially Observable MDPs

  323. Teaching Machines to Read and Comprehend

  324. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

  325. Visualizing and Understanding Recurrent Networks

  326. The Unreasonable Effectiveness of Recurrent Neural Networks

  327. Deep Neural Networks for Large Vocabulary Handwritten Text Recognition

  328. Reinforcement Learning Neural Turing Machines—Revised

  329. End-To-End Memory Networks

  330. LSTM: A Search Space Odyssey

  331. Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets

  332. DRAW: A Recurrent Neural Network For Image Generation

  333. Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews

  334. Neural Turing Machines

  335. Learning to Execute

  336. Neural Machine Translation by Jointly Learning to Align and Translate

  337. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

  338. GRU: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

  339. doc2vec: Distributed Representations of Sentences and Documents

  340. A Clockwork RNN

  341. One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling

  342. Generating Sequences With Recurrent Neural Networks

  343. On the difficulty of training Recurrent Neural Networks

  344. Recurrent Neural Network Based Language Model

  345. Large Language Models in Machine Translation

  346. Learning to Learn Using Gradient Descent

  347. Long Short-Term Memory

  348. Flat Minima

  349. Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity

  350. A Focused Backpropagation Algorithm for Temporal Pattern Recognition

  351. Learning Complex, Extended Sequences Using the Principle of History Compression

  352. Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks

  353. Untersuchungen zu dynamischen neuronalen Netzen [Studies of dynamic neural networks]

  354. Finding Structure In Time

  355. Connectionist Music Composition Based on Melodic, Stylistic, and Psychophysical Constraints [Technical report CU-CS–495–90]

  356. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks

  357. Recurrent Backpropagation and Hopfield Networks

  358. Backpropagation in Perceptrons with Feedback

  359. Experimental Analysis of the Real-time Recurrent Learning Algorithm

  360. A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks

  361. A Sticky-Bit Approach for Learning to Represent State

  362. Generalization of backpropagation with application to a recurrent gas market model

  363. Generalization of back-propagation to recurrent neural networks

  364. The Utility Driven Dynamic Error Propagation Network (RTRL)

  365. A self-optimizing, non-symmetrical neural net for content addressable memory and pattern recognition

  366. Programming a massively parallel, computation universal system: Static behavior

  367. Serial Order: A Parallel Distributed Processing Approach

  368. Hypernetworks [Blog]

  369. Safety-First AI for Autonomous Data Center Cooling and Industrial Control

  370. Attention and Augmented Recurrent Neural Networks

  371. BlinkDL/RWKV-LM: RWKV Is an RNN With Transformer-Level LLM Performance. It Can Be Directly Trained like a GPT (parallelizable). So It’s Combining the Best of RNN and Transformer—Great Performance, Fast Inference, Saves VRAM, Fast Training, "Infinite" Ctx_len, and Free Sentence Embedding.

  372. Efficient, Reusable RNNs and LSTMs for Torch

  373. Updated Training?

  374. Minimaxir/textgenrnn: Easily Train Your Own Text-Generating Neural Network of Any Size and Complexity on Any Text Dataset With a Few Lines of Code.

  375. Deep Learning for Assisting the Process of Music Composition (part 3)

  376. Metalearning or Learning to Learn Since 1987

  377. 76c24cf0db4abf4ce2b77d22182272d8e62d1a28.html

  378. Stream Seaandsailor

  379. Composing Music With Recurrent Neural Networks

  380. 0108f05c124cfb8b547e784dba32a6a5be813f44.html

  381. 2021-droppo-figure5-lstmvstransformerscaling.png

  382. 2021-jaegle-figure2-perceiverioarchitecture.png

  383. 2021-li-figure1-werasrscaling.png

  384. 2020-deepmind-agent57-figure3-deepreinforcementlearningtimeline.svg

  385. 2020-deepmind-agent57-performancetable.svg

  386. 2017-khalifa-example3-incoherentdeeptinglesamplepromptedwithmobydickcallmeishmael.png

  387. 2017-krause-figure2-dynamicevaluationrnnpredictionofwikipediaandspanishtextshowingtesttimeadaptation.png

  388. 2016-06-09-rossgoodwin-adventuresinnarratedreality-2.html

  389. 2016-03-19-rossgoodwin-adventuresinnarratedreality-1.html

  390. 2015-06-03-karpathy-charrnn-visualization.tar.xz

  391. http://www.byronknoll.com/cmix.html

  392. 2fcab2e668d671915958f32dc7e5d4607a1bd18b.html

  393. https://aclanthology.org/D13-1176.pdf

  394. 8ecd1e30c66400dbf395e3f5b7852c8bb70a02d2.pdf

  395. https://ahrm.github.io/jekyll/update/2022/04/14/using-languge-models-to-read-faster.html

  396. https://bellard.org/ts_server/ts_zip.html

  397. https://blog.ought.com/dalca-4d47a90edd92

  398. 8a06a3f7afb28912aed273184beb600d643b5f09.html

  399. https://blog.rwkv.com/p/eagle-7b-soaring-past-transformers

  400. 383941f46b66a161efb2b39936e04635da090a5f.html

  401. https://cprimozic.net/blog/growing-sparse-computational-graphs-with-rnns/

  402. c9d32a6c96999e0be1f657d4316c77ec606c7677.html

  403. https://homepages.inf.ed.ac.uk/abmayne/publications/sennrich2016NAACL.pdf

  404. https://jackcook.com/2024/02/23/mamba.html

  405. 77db36a2dade6a747ac6d14f6d42e0b2b2285a5a.html

  406. https://magenta.tensorflow.org/blog/2017/06/01/waybackprop

  407. https://manifestai.com/blogposts/faster-after-all/

  408. eaed3ff82cffc0550ba85e7c9c89febbdd547d4e.html

  409. https://mi.eng.cam.ac.uk/projects/cued-rnnlm/papers/Interspeech15.pdf

  410. https://nlp.stanford.edu/~johnhew/rnns-hierarchy.html

  411. 9e6e8d0ed92bbee8142ad97f0de3567c20de436c.html

  412. https://patentimages.storage.googleapis.com/57/53/22/91b8a6792dbb1e/US20180204116A1.pdf#deepmind

  413. https://sander.ai/2023/07/20/perspectives.html

  414. https://wandb.ai/wandb_fc/articles/reports/Image-to-LaTeX--Vmlldzo1NDQ0MTAx

  415. https://www.ai21.com/blog/announcing-jamba

  416. 4eb64a22dba344b9bdff4f6c9767f8e64482e521.html

  417. https://www.canva.dev/blog/engineering/ship-shape/

  418. https://www.cs.utoronto.ca/~ilya/pubs/ilya_sutskever_phd_thesis.pdf#page=67

  419. 15e5dc3ad2c783e88d87d6299e7ade5b9d35ba16.pdf#page=67

  420. https://www.lesswrong.com/posts/bD4B2MF7nsGAfH9fj/basic-mathematics-of-predictive-coding

  421. https://www.lesswrong.com/posts/mxa7XZ8ajE2oarWcr/lawrencec-s-shortform#pEqfzPMpqsnhaGrNK

  422. https://www.nature.com/articles/s41598-023-35597-4

  423. https://www.reddit.com/r/MachineLearning/comments/11nre6t/p_rwkv_14b_is_a_strong_chatbot_despite_only/

  424. https://www.reddit.com/r/MachineLearning/comments/umq908/r_rwkvv2rnn_a_parallelizable_rnn_with/

  425. 2a40a13e632302df9433aa93306ce591f3ac2412.html

  426. https://www.reddit.com/r/MachineLearning/comments/yxt8sa/r_rwkv4_7b_release_an_attentionfree_rnn_language/

  427. ad5ae41ac26ebe772323d4cfbfd6bf360e7efa3b.html

  428. https://x.com/BlinkDL_AI/status/1677593798531223552

  429. https://x.com/BlinkDL_AI/status/1784496793075744966

  430. https://x.com/RichardSocher/status/1736161332259614989

  431. https://x.com/arankomatsuzaki/status/1639000379978403853

  432. https://x.com/mayfer/status/1732269798934106133

  433. FlashRNN: Optimizing Traditional RNNs on Modern Hardware

  434. https%253A%252F%252Farxiv.org%252Fabs%252F2412.07752.html

  435. Were RNNs All We Needed?

  436. https%253A%252F%252Farxiv.org%252Fabs%252F2410.01201.html

  437. The Mamba in the Llama: Distilling and Accelerating Hybrid Models

  438. Junxiong Wang

  439. https://rush-nlp.com/

  440. Tri Dao

  441. https%253A%252F%252Farxiv.org%252Fabs%252F2408.15237.html

  442. An Empirical Study of Mamba-based Language Models

  443. Tri Dao

  444. Albert Gu

  445. https%253A%252F%252Farxiv.org%252Fabs%252F2406.07887.html

  446. Grokfast: Accelerated Grokking by Amplifying Slow Gradients

  447. https%253A%252F%252Farxiv.org%252Fabs%252F2405.20233.html

  448. Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

  449. Luke Zettlemoyer

  450. Omer Levy

  451. https%253A%252F%252Farxiv.org%252Fabs%252F2404.08801%2523facebook.html

  452. Does Transformer Interpretability Transfer to RNNs?

  453. https%253A%252F%252Farxiv.org%252Fabs%252F2404.05971%2523eleutherai.html

  454. Mechanistic Design and Scaling of Hybrid Architectures

  455. Stefano Ermon

  456. https%253A%252F%252Farxiv.org%252Fabs%252F2403.17844.html

  457. ZigMa: Zigzag Mamba Diffusion Model

  458. https%253A%252F%252Farxiv.org%252Fabs%252F2403.13802.html

  459. Zoology: Measuring and Improving Recall in Efficient Language Models

  460. https%253A%252F%252Farxiv.org%252Fabs%252F2312.04927.html

  461. Mamba: Linear-Time Sequence Modeling with Selective State Spaces

  462. Albert Gu

  463. Tri Dao

  464. https%253A%252F%252Farxiv.org%252Fabs%252F2312.00752.html

  465. Learning few-shot imitation as cultural transmission

  466. https%253A%252F%252Fwww.nature.com%252Farticles%252Fs41467-023-42875-2%2523deepmind.html

  467. Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors

  468. https%253A%252F%252Farxiv.org%252Fabs%252F2310.02980.html

  469. Thought Cloning: Learning to Think while Acting by Imitating Human Thinking

  470. Jeff Clune—Professor—Computer Science—University of British Columbia

  471. https%253A%252F%252Farxiv.org%252Fabs%252F2306.00323.html

  472. RWKV: Reinventing RNNs for the Transformer Era

  473. https%253A%252F%252Farxiv.org%252Fabs%252F2305.13048.html

  474. Resurrecting Recurrent Neural Networks for Long Sequences

  475. Albert Gu

  476. https://sites.google.com/view/razp/home

  477. https%253A%252F%252Farxiv.org%252Fabs%252F2303.06349%2523deepmind.html

  478. SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

  479. https%253A%252F%252Farxiv.org%252Fabs%252F2302.13939.html

  480. Organic reaction mechanism classification using machine learning

  481. %252Fdoc%252Fscience%252F2023-bures.pdf.html

  482. Hungry Hungry Hippos: Towards Language Modeling with State Space Models

  483. Tri Dao

  484. https%253A%252F%252Farxiv.org%252Fabs%252F2212.14052.html

  485. Pretraining Without Attention

  486. Junxiong Wang

  487. Albert Gu

  488. https://rush-nlp.com/

  489. https%253A%252F%252Farxiv.org%252Fabs%252F2212.10544.html

  490. Legged Locomotion in Challenging Terrains using Egocentric Vision

  491. https%253A%252F%252Farxiv.org%252Fabs%252F2211.07638.html

  492. Omnigrok: Grokking Beyond Algorithmic Data

  493. https%253A%252F%252Farxiv.org%252Fabs%252F2210.01117.html

  494. Semantic scene descriptions as an objective of human vision

  495. https%253A%252F%252Farxiv.org%252Fabs%252F2209.11737.html

  496. Sequencer: Deep LSTM for Image Classification

  497. https%253A%252F%252Farxiv.org%252Fabs%252F2205.01972.html

  498. Semantic projection recovers rich human knowledge of multiple object features from word embeddings

  499. %252Fdoc%252Fai%252Fnn%252Frnn%252F2022-grand.pdf.html

  500. Block-Recurrent Transformers

  501. Yuhuai (Tony) Wu’s Home Page

  502. Behnam Neyshabur

  503. https%253A%252F%252Farxiv.org%252Fabs%252F2203.07852.html

  504. General-purpose, long-context autoregressive modeling with Perceiver AR

  505. https%253A%252F%252Farxiv.org%252Fabs%252F2202.07765%2523deepmind.html

  506. Learning robust perceptive locomotion for quadrupedal robots in the wild

  507. %252Fdoc%252Freinforcement-learning%252Fmeta-learning%252F2022-miki.pdf.html

  508. S4: Efficiently Modeling Long Sequences with Structured State Spaces

  509. Albert Gu

  510. https%253A%252F%252Farxiv.org%252Fabs%252F2111.00396.html

  511. A connectome of the Drosophila central complex reveals network motifs suitable for flexible navigation and context-dependent action selection

  512. https%253A%252F%252Felifesciences.org%252Farticles%252F66039.html

  513. Perceiver IO: A General Architecture for Structured Inputs & Outputs

  514. https%253A%252F%252Farxiv.org%252Fabs%252F2107.14795%2523deepmind.html

  515. PES: Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies

  516. Luke Metz

  517. Jascha Sohl-Dickstein

  518. https%253A%252F%252Fproceedings.mlr.press%252Fv139%252Fvicol21a.html.html

  519. Shelley: A Crowd-sourced Collaborative Horror Writer

  520. %252Fdoc%252Fai%252Fnn%252Frnn%252F2021-delul.pdf.html

  521. Ten Lessons From Three Generations Shaped Google’s TPUv4i

  522. %252Fdoc%252Fai%252Fscaling%252Fhardware%252F2021-jouppi.pdf.html

  523. RASP: Thinking Like Transformers

  524. https%253A%252F%252Farxiv.org%252Fabs%252F2106.06981.html

  525. Scaling Laws for Acoustic Models

  526. https%253A%252F%252Farxiv.org%252Fabs%252F2106.09488%2523amazon.html

  527. Perceiver: General Perception with Iterative Attention

  528. https%253A%252F%252Farxiv.org%252Fabs%252F2103.03206%2523deepmind.html

  529. Deep Residual Learning in Spiking Neural Networks

  530. https%253A%252F%252Farxiv.org%252Fabs%252F2102.04159.html

  531. Towards Playing Full MOBA Games with Deep Reinforcement Learning

  532. https%253A%252F%252Farxiv.org%252Fabs%252F2011.12692%2523tencent.html

  533. HiPPO: Recurrent Memory with Optimal Polynomial Projections

  534. Albert Gu

  535. Tri Dao

  536. Stefano Ermon

  537. https%253A%252F%252Farxiv.org%252Fabs%252F2008.07669.html

  538. Matt Botvinick on the spontaneous emergence of learning algorithms

  539. https%253A%252F%252Fwww.lesswrong.com%252Fposts%252FWnqua6eQkewL3bqsF%252Fmatt-botvinick-on-the-spontaneous-emergence-of-learning.html

  540. Agent57: Outperforming the human Atari benchmark

  541. https%253A%252F%252Fdeepmind.google%252Fdiscover%252Fblog%252Fagent57-outperforming-the-human-atari-benchmark%252F.html

  542. Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving

  543. Stefano Ermon

  544. https%253A%252F%252Farxiv.org%252Fabs%252F2002.03629.html

  545. Scaling Laws for Neural Language Models

  546. Jared Kaplan

  547. Sam McCandlish

  548. Alec Radford

  549. https%253A%252F%252Farxiv.org%252Fabs%252F2001.08361%2523openai.html

  550. Single Headed Attention RNN: Stop Thinking With Your Head

  551. State of the Smerity

  552. https%253A%252F%252Farxiv.org%252Fabs%252F1911.11423.html

  553. Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks

  554. https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DHyxlRHBlUB.html

  555. SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference

  556. https%253A%252F%252Farxiv.org%252Fabs%252F1910.06591%2523deepmind.html

  557. Mogrifier LSTM

  558. https%253A%252F%252Farxiv.org%252Fabs%252F1909.01792%2523deepmind.html

  559. Language Modeling State-of-the-art leaderboards

  560. https%253A%252F%252Fpaperswithcode.com%252Ftask%252Flanguage-modelling.html

  561. Meta-learners’ learning dynamics are unlike learners’

  562. https%253A%252F%252Farxiv.org%252Fabs%252F1905.01320%2523deepmind.html

  563. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

  564. Zihang Dai

  565. Zhilin Yang

  566. https://www.cs.cmu.edu/~./yiming/

  567. https%253A%252F%252Farxiv.org%252Fabs%252F1901.02860.html

  568. R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning

  569. https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253Dr1lyTjAqYX%2523deepmind.html

  570. ULMFiT: Universal Language Model Fine-tuning for Text Classification

  571. https%253A%252F%252Farxiv.org%252Fabs%252F1801.06146.html

  572. Dynamic Evaluation of Neural Sequence Models

  573. https%253A%252F%252Farxiv.org%252Fabs%252F1709.07432.html

  574. SRU: Simple Recurrent Units for Highly Parallelizable Recurrence

  575. https%253A%252F%252Farxiv.org%252Fabs%252F1709.02755.html

  576. SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine

  577. Kyunghyun Cho

  578. https%253A%252F%252Farxiv.org%252Fabs%252F1704.05179.html

  579. Learning to Reason: End-to-End Module Networks for Visual Question Answering

  580. Jacob Andreas @ MIT

  581. https%253A%252F%252Farxiv.org%252Fabs%252F1704.05526.html

  582. Clockwork Convnets for Video Semantic Segmentation

  583. https%253A%252F%252Farxiv.org%252Fabs%252F1608.03609.html

  584. End-To-End Memory Networks

  585. https%253A%252F%252Farxiv.org%252Fabs%252F1503.08895.html

  586. Recurrent Neural Network Based Language Model

  587. %252Fdoc%252Fai%252Fnn%252Frnn%252F2010-mikolov.pdf.html

  588. Untersuchungen zu dynamischen neuronalen Netzen [Studies of dynamic neural networks]

  589. %252Fdoc%252Fai%252Fnn%252Frnn%252F1991-hochreiter.pdf.html

  590. Experimental Analysis of the Real-time Recurrent Learning Algorithm

  591. %252Fdoc%252Fai%252Fnn%252Frnn%252F1989-williams.pdf.html

  592. Generalization of backpropagation with application to a recurrent gas market model

  593. %252Fdoc%252Fai%252Fnn%252Frnn%252F1988-werbos.pdf.html

  594. Generalization of back-propagation to recurrent neural networks

  595. %252Fdoc%252Fai%252Fnn%252Frnn%252F1987-pineda.pdf.html