‘RNN’ directory

See Also
Gwern
- “Absolute Unit NNs: Regression-Based MLPs for Everything ”, Gwern 2023
- “RNN Metadata for Mimicking Author Style ”, Gwern 2015
Links
Miscellaneous
Bibliography

See Also

Gwern

“Absolute Unit NNs: Regression-Based MLPs for Everything ”, Gwern 2023

Absolute Unit NNs: Regression-Based MLPs for Everything

“RNN Metadata for Mimicking Author Style ”, Gwern 2015

RNN Metadata for Mimicking Author Style

Links

“LSTM or Transformer As ‘Malware Packer’ ”, Bednarskiwsieci 2025

LSTM or Transformer as ‘malware packer’

“It’s All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization ”, Behrouz et al 2025

It’s All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

“M1: Towards Scalable Test-Time Compute With Mamba Reasoning Models ”, Wang et al 2025

M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

“One-Minute Video Generation With Test-Time Training ”, Dalal et al 2025

One-Minute Video Generation with Test-Time Training

“RWKV-7 ‘Goose’ With Expressive Dynamic State Evolution ”, Peng et al 2025

RWKV-7 ‘Goose’ with Expressive Dynamic State Evolution

“Training Language Models for Social Deduction With Multi-Agent Reinforcement Learning ”, Sarkar et al 2025

Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning

“Test-Time Regression: a Unifying Framework for Designing Sequence Models With Associative Memory ”, Wang et al 2025

Test-time regression: a unifying framework for designing sequence models with associative memory

“Titans: Learning to Memorize at Test Time ”, Behrouz et al 2024

Titans: Learning to Memorize at Test Time

“Human-Like Bots for Tactical Shooters Using Compute-Efficient Sensors ”, Justesen et al 2024

Human-like Bots for Tactical Shooters Using Compute-Efficient Sensors

“How AI Is Unlocking Ancient Texts: From Deciphering Burnt Roman Scrolls to Reading Crumbling Cuneiform Tablets, Neural Networks Could Give Researchers More Data Than They’ve Had in Centuries ”, Marchant 2024

How AI is unlocking ancient texts: From deciphering burnt Roman scrolls to reading crumbling cuneiform tablets, neural networks could give researchers more data than they’ve had in centuries

“FlashRNN: Optimizing Traditional RNNs on Modern Hardware ”, Pöppel et al 2024

FlashRNN: Optimizing Traditional RNNs on Modern Hardware

“Gated Delta Networks: Improving Mamba-2 With Delta Rule ”, Yang et al 2024

Gated Delta Networks: Improving Mamba-2 with Delta Rule

“Hymba: A Hybrid-Head Architecture for Small Language Models ”, Dong et al 2024

Hymba: A Hybrid-head Architecture for Small Language Models

“State-Space Models Can Learn In-Context by Gradient Descent ”, Sushma et al 2024

State-space models can learn in-context by gradient descent

“Were RNNs All We Needed? ”, Feng et al 2024

Were RNNs All We Needed?

“The Mamba in the Llama: Distilling and Accelerating Hybrid Models ”, Wang et al 2024

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

“`handwriter.ttf`: Handwriting Synthesis With Harfbuzz WASM ”, Jingyi 2024

handwriter.ttf: Handwriting synthesis with Harfbuzz WASM

“Learning to (Learn at Test Time): RNNs With Expressive Hidden States ”, Sun et al 2024

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

“An Empirical Study of Mamba-Based Language Models ”, Waleffe et al 2024

An Empirical Study of Mamba-based Language Models

“State Soup: In-Context Skill Learning, Retrieval and Mixing ”, Pióro et al 2024

State Soup: In-Context Skill Learning, Retrieval and Mixing

“RWKV-CLIP: A Robust Vision-Language Representation Learner ”, Gu et al 2024

RWKV-CLIP: A Robust Vision-Language Representation Learner

“Transformers Are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality ”, Dao & Gu 2024

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

“Grokfast: Accelerated Grokking by Amplifying Slow Gradients ”, Lee et al 2024

Grokfast: Accelerated Grokking by Amplifying Slow Gradients

“DiM: Scaling Diffusion Mamba With Bidirectional SSMs for Efficient Image and Video Generation ”, Mo & Tian 2024

DiM: Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation

“Attention As an RNN ”, Feng et al 2024

Attention as an RNN

“XLSTM: Extended Long Short-Term Memory ”, Beck et al 2024

xLSTM: Extended Long Short-Term Memory

“Megalodon: Efficient LLM Pretraining and Inference With Unlimited Context Length ”, Ma et al 2024

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

“The Illusion of State in State-Space Models ”, Merrill et al 2024

The Illusion of State in State-Space Models

“An Accurate and Rapidly Calibrating Speech Neuroprosthesis ”, Card et al 2024

An accurate and rapidly calibrating speech neuroprosthesis

“Does Transformer Interpretability Transfer to RNNs? ”, Paulo et al 2024

Does Transformer Interpretability Transfer to RNNs?

“Mechanistic Design and Scaling of Hybrid Architectures ”, Poli et al 2024

Mechanistic Design and Scaling of Hybrid Architectures

“GLE: Backpropagation through Space, Time, and the Brain ”, Ellenberger et al 2024

GLE: Backpropagation through space, time, and the brain

“ZigMa: Zigzag Mamba Diffusion Model ”, Hu et al 2024

ZigMa: Zigzag Mamba Diffusion Model

“RNNs Are Not Transformers (Yet): The Key Bottleneck on In-Context Retrieval ”, Wen et al 2024

RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval

“Verifiable Evaluations of Machine Learning Models Using ZkSNARKs ”, South et al 2024

Verifiable evaluations of machine learning models using zkSNARKs

“MambaByte: Token-Free Selective State Space Model ”, Wang et al 2024

MambaByte: Token-free Selective State Space Model

“MoE-Mamba: Efficient Selective State Space Models With Mixture of Experts ”, Pióro et al 2024

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

“Evolving Reservoirs for Meta Reinforcement Learning ”, Léger et al 2023

Evolving Reservoirs for Meta Reinforcement Learning

“Zoology: Measuring and Improving Recall in Efficient Language Models ”, Arora et al 2023

Zoology: Measuring and Improving Recall in Efficient Language Models

“Mamba: Linear-Time Sequence Modeling With Selective State Spaces ”, Gu & Dao 2023

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

“Diffusion Models Without Attention ”, Yan et al 2023

Diffusion Models Without Attention

“Learning Few-Shot Imitation As Cultural Transmission ”, Bhoopchand et al 2023

Learning few-shot imitation as cultural transmission

“Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks ”, Ramesh et al 2023

Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks

“HGRN: Hierarchically Gated Recurrent Neural Network for Sequence Modeling ”, Qin et al 2023

HGRN: Hierarchically Gated Recurrent Neural Network for Sequence Modeling

“On Prefrontal Working Memory and Hippocampal Episodic Memory: Unifying Memories Stored in Weights and Activation Slots ”, Whittington et al 2023

On prefrontal working memory and hippocampal episodic memory: Unifying memories stored in weights and activation slots

“GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling ”, Katsch 2023

GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling

“ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-Like Language Models ”, Luo et al 2023

ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting of RNN-like Language Models

“Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study With Linear Models ”, Fu et al 2023

Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models

“Generalization in Sensorimotor Networks Configured With Natural Language Instructions ”, Riveland & Pouget 2023

Generalization in Sensorimotor Networks Configured with Natural Language Instructions

“Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors ”, Amos et al 2023

Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors

“Parallelizing Non-Linear Sequential Models over the Sequence Length ”, Lim et al 2023

Parallelizing non-linear sequential models over the sequence length

“Gated Recurrent Neural Networks Discover Attention ”, Zucchet et al 2023

Gated recurrent neural networks discover attention

“A High-Performance Neuroprosthesis for Speech Decoding and Avatar Control ”, Metzger et al 2023

A high-performance neuroprosthesis for speech decoding and avatar control

“`ts_zip`: Text Compression Using Large Language Models [RWKV 169M V4] ”, Bellard 2023

ts_zip: Text Compression using Large Language Models [RWKV 169M v4] :

View HTML:

/doc/www/bellard.org/7f1da4826605365242b7d40d83f8e13785239846.html

“Learning to Model the World With Language ”, Lin et al 2023

Learning to Model the World with Language

“Retentive Network: A Successor to Transformer for Large Language Models ”, Sun et al 2023

Retentive Network: A Successor to Transformer for Large Language Models

“Using Sequences of Life-Events to Predict Human Lives ”, Savcisens et al 2023

Using Sequences of Life-events to Predict Human Lives

“Thought Cloning: Learning to Think While Acting by Imitating Human Thinking ”, Hu & Clune 2023

Thought Cloning: Learning to Think while Acting by Imitating Human Thinking

“RWKV: Reinventing RNNs for the Transformer Era ”, Peng et al 2023

RWKV: Reinventing RNNs for the Transformer Era

“Emergence of Belief-Like Representations through Reinforcement Learning ”, Hennig et al 2023

Emergence of belief-like representations through reinforcement learning

“Model Scale versus Domain Knowledge in Statistical Forecasting of Chaotic Systems ”, Gilpin 2023

Model scale versus domain knowledge in statistical forecasting of chaotic systems

“Resurrecting Recurrent Neural Networks for Long Sequences ”, Orvieto et al 2023

Resurrecting Recurrent Neural Networks for Long Sequences

“SpikeGPT: Generative Pre-Trained Language Model With Spiking Neural Networks ”, Zhu et al 2023

SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

“Organic Reaction Mechanism Classification Using Machine Learning ”, Burés & Larrosa 2023

Organic reaction mechanism classification using machine learning

“A High-Performance Speech Neuroprosthesis ”, Willett et al 2023

A high-performance speech neuroprosthesis

“Hungry Hungry Hippos: Towards Language Modeling With State Space Models ”, Fu et al 2022

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

“Pretraining Without Attention ”, Wang et al 2022

Pretraining Without Attention

“A 64-Core Mixed-Signal In-Memory Compute Chip Based on Phase-Change Memory for Deep Neural Network Inference ”, Gallo et al 2022

A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference

“Melting Pot 2.0 ”, Agapiou et al 2022

Melting Pot 2.0

“VeLO: Training Versatile Learned Optimizers by Scaling Up ”, Metz et al 2022

VeLO: Training Versatile Learned Optimizers by Scaling Up

“Legged Locomotion in Challenging Terrains Using Egocentric Vision ”, Agarwal et al 2022

Legged Locomotion in Challenging Terrains using Egocentric Vision

“Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities ”, Tjandra et al 2022

Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities

“Perfectly Secure Steganography Using Minimum Entropy Coupling ”, Witt et al 2022

Perfectly Secure Steganography Using Minimum Entropy Coupling

“Omnigrok: Grokking Beyond Algorithmic Data ”, Liu et al 2022

Omnigrok: Grokking Beyond Algorithmic Data

“Semantic Scene Descriptions As an Objective of Human Vision ”, Doerig et al 2022

Semantic scene descriptions as an objective of human vision

“Benchmarking Compositionality With Formal Languages ”, Valvoda et al 2022

Benchmarking Compositionality with Formal Languages

“Learning to Generalize With Object-Centric Agents in the Open World Survival Game Crafter ”, Stanić et al 2022

Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter

“PI-ARS: Accelerating Evolution-Learned Visual-Locomotion With Predictive Information Representations ”, Lee et al 2022

PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations

“Spatial Representation by Ramping Activity of Neurons in the Retrohippocampal Cortex ”, Tennant et al 2022

Spatial representation by ramping activity of neurons in the retrohippocampal cortex

“Neural Networks and the Chomsky Hierarchy ”, Delétang et al 2022

Neural Networks and the Chomsky Hierarchy

“BYOL-Explore: Exploration by Bootstrapped Prediction ”, Guo et al 2022

BYOL-Explore: Exploration by Bootstrapped Prediction

“AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos ”, Wu et al 2022

AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos

“Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline (3RL) ”, Caccia et al 2022

Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline (3RL)

“Simple Recurrence Improves Masked Language Models ”, Lei et al 2022

Simple Recurrence Improves Masked Language Models

“Sequencer: Deep LSTM for Image Classification ”, Tatsunami & Taki 2022

Sequencer: Deep LSTM for Image Classification

“Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers ”, Chan et al 2022

Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers

“Semantic Projection Recovers Rich Human Knowledge of Multiple Object Features from Word Embeddings ”, Grand et al 2022

Semantic projection recovers rich human knowledge of multiple object features from word embeddings

“Block-Recurrent Transformers ”, Hutchins et al 2022

Block-Recurrent Transformers

“All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL ”, Arulkumaran et al 2022

All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL

“Retrieval-Augmented Reinforcement Learning ”, Goyal et al 2022

Retrieval-Augmented Reinforcement Learning

“Learning by Directional Gradient Descent ”, Silver et al 2022

Learning by Directional Gradient Descent

“General-Purpose, Long-Context Autoregressive Modeling With Perceiver AR ”, Hawthorne et al 2022

General-purpose, long-context autoregressive modeling with Perceiver AR

“End-To-End Algorithm Synthesis With Recurrent Networks: Logical Extrapolation Without Overthinking ”, Bansal et al 2022

End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking

“Data Scaling Laws in NMT: The Effect of Noise and Architecture ”, Bansal et al 2022

Data Scaling Laws in NMT: The Effect of Noise and Architecture

“Active Predictive Coding Networks: A Neural Solution to the Problem of Learning Reference Frames and Part-Whole Hierarchies ”, Gklezakos & Rao 2022

Active Predictive Coding Networks: A Neural Solution to the Problem of Learning Reference Frames and Part-Whole Hierarchies

“Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild ”, Miki et al 2022

Learning robust perceptive locomotion for quadrupedal robots in the wild

“Inducing Causal Structure for Interpretable Neural Networks (IIT) ”, Geiger et al 2021

Inducing Causal Structure for Interpretable Neural Networks (IIT)

“Evaluating Distributional Distortion in Neural Language Modeling ”, Anonymous 2021

Evaluating Distributional Distortion in Neural Language Modeling

“Gradients Are Not All You Need ”, Metz et al 2021

Gradients are Not All You Need

“An Explanation of In-Context Learning As Implicit Bayesian Inference ”, Xie et al 2021

An Explanation of In-context Learning as Implicit Bayesian Inference

“S4: Efficiently Modeling Long Sequences With Structured State Spaces ”, Gu et al 2021

S4: Efficiently Modeling Long Sequences with Structured State Spaces

“Minimum Description Length Recurrent Neural Networks ”, Lan et al 2021

Minimum Description Length Recurrent Neural Networks

“LSSL: Combining Recurrent, Convolutional, and Continuous-Time Models With Linear State-Space Layers ”, Gu et al 2021

LSSL: Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers

“A Connectome of the Drosophila Central Complex Reveals Network Motifs Suitable for Flexible Navigation and Context-Dependent Action Selection ”, Hulse et al 2021

A connectome of the Drosophila central complex reveals network motifs suitable for flexible navigation and context-dependent action selection

“Recurrent Model-Free RL Is a Strong Baseline for Many POMDPs ”, Ni et al 2021

Recurrent Model-Free RL is a Strong Baseline for Many POMDPs

“AFT: An Attention Free Transformer ”, Zhai et al 2021

AFT: An Attention Free Transformer

“Photos Are All You Need for Reciprocal Recommendation in Online Dating ”, Neve & McConville 2021

Photos Are All You Need for Reciprocal Recommendation in Online Dating

“Perceiver IO: A General Architecture for Structured Inputs & Outputs ”, Jaegle et al 2021

Perceiver IO: A General Architecture for Structured Inputs & Outputs

“PES: Unbiased Gradient Estimation in Unrolled Computation Graphs With Persistent Evolution Strategies ”, Vicol et al 2021

PES: Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies

“Shelley: A Crowd-Sourced Collaborative Horror Writer ”, Delul et al 2021

Shelley: A Crowd-sourced Collaborative Horror Writer

“Ten Lessons From Three Generations Shaped Google’s TPUv4i ”, Jouppi et al 2021

Ten Lessons From Three Generations Shaped Google’s TPUv4i

“RASP: Thinking Like Transformers ”, Weiss et al 2021

RASP: Thinking Like Transformers

“Scaling Laws for Acoustic Models ”, Droppo & Elibol 2021

Scaling Laws for Acoustic Models

“Scaling End-To-End Models for Large-Scale Multilingual ASR ”, Li et al 2021

Scaling End-to-End Models for Large-Scale Multilingual ASR

“Sensitivity As a Complexity Measure for Sequence Classification Tasks ”, Hahn et al 2021

Sensitivity as a Complexity Measure for Sequence Classification Tasks

“ALD: Efficient Transformers in Reinforcement Learning Using Actor-Learner Distillation ”, Parisotto & Salakhutdinov 2021

ALD: Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation

“Finetuning Pretrained Transformers into RNNs ”, Kasai et al 2021

Finetuning Pretrained Transformers into RNNs

“Pretrained Transformers As Universal Computation Engines ”, Lu et al 2021

Pretrained Transformers as Universal Computation Engines

“Perceiver: General Perception With Iterative Attention ”, Jaegle et al 2021

Perceiver: General Perception with Iterative Attention

“When Attention Meets Fast Recurrence: Training SRU++ Language Models With Reduced Compute ”, Lei 2021

When Attention Meets Fast Recurrence: Training SRU++ Language Models with Reduced Compute

“Generative Speech Coding With Predictive Variance Regularization ”, Kleijn et al 2021

Generative Speech Coding with Predictive Variance Regularization

“Predictive Coding Is a Consequence of Energy Efficiency in Recurrent Neural Networks ”, Ali et al 2021

Predictive coding is a consequence of energy efficiency in recurrent neural networks

“Deep Residual Learning in Spiking Neural Networks ”, Fang et al 2021

Deep Residual Learning in Spiking Neural Networks

“Distilling Large Language Models into Tiny and Effective Students Using PQRNN ”, Kaliamoorthi et al 2021

Distilling Large Language Models into Tiny and Effective Students using pQRNN

“Meta Learning Backpropagation And Improving It ”, Kirsch & Schmidhuber 2020

Meta Learning Backpropagation And Improving It

“On the Binding Problem in Artificial Neural Networks ”, Greff et al 2020

On the Binding Problem in Artificial Neural Networks

“A Recurrent Vision-And-Language BERT for Navigation ”, Hong et al 2020

A Recurrent Vision-and-Language BERT for Navigation

“Towards Playing Full MOBA Games With Deep Reinforcement Learning ”, Ye et al 2020

Towards Playing Full MOBA Games with Deep Reinforcement Learning

“Multimodal Dynamics Modeling for Off-Road Autonomous Vehicles ”, Tremblay et al 2020

Multimodal dynamics modeling for off-road autonomous vehicles

“Adversarial Vulnerabilities of Human Decision-Making ”, Dezfouli et al 2020

Adversarial vulnerabilities of human decision-making

“Learning to Summarize Long Texts With Memory Compression and Transfer ”, Park et al 2020

Learning to Summarize Long Texts with Memory Compression and Transfer

“Human-Centric Dialog Training via Offline Reinforcement Learning ”, Jaques et al 2020

Human-centric Dialog Training via Offline Reinforcement Learning

“Deep Reinforcement Learning for Closed-Loop Blood Glucose Control ”, Fox et al 2020

Deep Reinforcement Learning for Closed-Loop Blood Glucose Control

“HiPPO: Recurrent Memory With Optimal Polynomial Projections ”, Gu et al 2020

HiPPO: Recurrent Memory with Optimal Polynomial Projections

“Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size ”, Yoshida et al 2020

Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size

“Matt Botvinick on the Spontaneous Emergence of Learning Algorithms ”, Scholl 2020

Matt Botvinick on the spontaneous emergence of learning algorithms

“Cultural Influences on Word Meanings Revealed through Large-Scale Semantic Alignment ”, Thompson et al 2020

Cultural influences on word meanings revealed through large-scale semantic alignment

“DeepSinger: Singing Voice Synthesis With Data Mined From the Web ”, Ren et al 2020

DeepSinger: Singing Voice Synthesis with Data Mined From the Web

“High-Performance Brain-To-Text Communication via Imagined Handwriting ”, Willett et al 2020

High-performance brain-to-text communication via imagined handwriting

“Transformers Are RNNs: Fast Autoregressive Transformers With Linear Attention ”, Katharopoulos et al 2020

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

“The Recurrent Neural Tangent Kernel ”, Alemohammad et al 2020

The Recurrent Neural Tangent Kernel

“Untangling Tradeoffs between Recurrence and Self-Attention in Neural Networks ”, Kerg et al 2020

Untangling tradeoffs between recurrence and self-attention in neural networks

“Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing ”, Dai et al 2020

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

“Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models ”, Papadimitriou & Jurafsky 2020

Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models

“Syntactic Structure from Deep Learning ”, Linzen & Baroni 2020

Syntactic Structure from Deep Learning

“Agent57: Outperforming the Human Atari Benchmark ”, Puigdomènech et al 2020

Agent57: Outperforming the human Atari benchmark

“Machine Translation of Cortical Activity to Text With an Encoder-Decoder Framework ”, Makin et al 2020

Machine translation of cortical activity to text with an encoder-decoder framework :

View PDF:

/doc/ai/nn/rnn/2020-makin.pdf

“Learning-Based Memory Allocation for C++ Server Workloads ”, Maas et al 2020

Learning-based Memory Allocation for C++ Server Workloads

“Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving ”, Song et al 2020

Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving

“Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks ”, Hasson et al 2020

Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks

“Scaling Laws for Neural Language Models ”, Kaplan et al 2020

Scaling Laws for Neural Language Models

“Estimating the Deep Replicability of Scientific Findings Using Human and Artificial Intelligence ”, Yang et al 2020

Estimating the deep replicability of scientific findings using human and artificial intelligence

“Placing Language in an Integrated Understanding System: Next Steps toward Human-Level Performance in Neural Language Models ”, McClelland et al 2020

Placing language in an integrated understanding system: Next steps toward human-level performance in neural language models

“Measuring Compositional Generalization: A Comprehensive Method on Realistic Data ”, Keysers et al 2019

Measuring Compositional Generalization: A Comprehensive Method on Realistic Data

“SimpleBooks: Long-Term Dependency Book Dataset With Simplified English Vocabulary for Word-Level Language Modeling ”, Nguyen 2019

SimpleBooks: Long-term dependency book dataset with simplified English vocabulary for word-level language modeling

“Single Headed Attention RNN: Stop Thinking With Your Head ”, Merity 2019

Single Headed Attention RNN: Stop Thinking With Your Head

“Excavate ”, Lynch 2019

“MuZero: Mastering Atari, Go, Chess and Shogi by Planning With a Learned Model ”, Schrittwieser et al 2019

MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

“CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning ”, Lin et al 2019

CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning

“High Fidelity Video Prediction With Large Stochastic Recurrent Neural Networks ”, Villegas et al 2019

High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks

“Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks ”, Voelker et al 2019

Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks

“SEED RL: Scalable and Efficient Deep-RL With Accelerated Central Inference ”, Espeholt et al 2019

SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference

“Mixed-Signal Neuromorphic Processors: Quo Vadis? ”, Bavandpour et al 2019

Mixed-Signal Neuromorphic Processors: Quo vadis?

“Restoring Ancient Text Using Deep Learning (Pythia): a Case Study on Greek Epigraphy ”, Assael et al 2019

Restoring ancient text using deep learning (Pythia): a case study on Greek epigraphy

“Mogrifier LSTM ”, Melis et al 2019

Mogrifier LSTM

“R2D3: Making Efficient Use of Demonstrations to Solve Hard Exploration Problems ”, Paine et al 2019

R2D3: Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

“Language Modeling State-Of-The-Art Leaderboards ”, paperswithcode.com 2019

Language Modeling State-of-the-art leaderboards

“Metalearned Neural Memory ”, Munkhdalai et al 2019

Metalearned Neural Memory

“Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank ”, Socher et al 2019

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

“Generating Text With Recurrent Neural Networks ”, Sutskever et al 2019

Generating Text with Recurrent Neural Networks

“XLNet: Generalized Autoregressive Pretraining for Language Understanding ”, Yang et al 2019

XLNet: Generalized Autoregressive Pretraining for Language Understanding

“Playing the Lottery With Rewards and Multiple Languages: Lottery Tickets in RL and NLP ”, Yu et al 2019

Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP

“MoGlow: Probabilistic and Controllable Motion Synthesis Using Normalizing Flows ”, Henter et al 2019

MoGlow: Probabilistic and controllable motion synthesis using normalizing flows

“Reinforcement Learning, Fast and Slow ”, Botvinick et al 2019

Reinforcement Learning, Fast and Slow

“Meta-Learners’ Learning Dynamics Are unlike Learners’ ”, Rabinowitz 2019

Meta-learners’ learning dynamics are unlike learners’

“Speech Synthesis from Neural Decoding of Spoken Sentences ”, Anumanchipalli et al 2019

Speech synthesis from neural decoding of spoken sentences

“Good News, Everyone! Context Driven Entity-Aware Captioning for News Images ”, Biten et al 2019

Good News, Everyone! Context driven entity-aware captioning for news images

“Surrogate Gradient Learning in Spiking Neural Networks ”, Neftci et al 2019

Surrogate Gradient Learning in Spiking Neural Networks

“On the Turing Completeness of Modern Neural Network Architectures ”, Pérez et al 2019

On the Turing Completeness of Modern Neural Network Architectures

“Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context ”, Dai et al 2019

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

“Natural Questions: A Benchmark for Question Answering Research ”, Kwiatkowski et al 2019

Natural Questions: A Benchmark for Question Answering Research

“High Fidelity Video Prediction With Large Stochastic Recurrent Neural Networks: Videos ”, Villegas et al 2019

High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks: Videos

“Bayesian Layers: A Module for Neural Network Uncertainty ”, Tran et al 2018

Bayesian Layers: A Module for Neural Network Uncertainty

“Meta-Learning: Learning to Learn Fast ”, Weng 2018

Meta-Learning: Learning to Learn Fast

“Piano Genie ”, Donahue et al 2018

Piano Genie

“Learning Recurrent Binary/Ternary Weights ”, Ardakani et al 2018

Learning Recurrent Binary/Ternary Weights

“R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning ”, Kapturowski et al 2018

R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning

“HotpotQA: A Dataset for Diverse, Explainable Multi-Hop Question Answering ”, Yang et al 2018

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

“Adversarial Reprogramming of Text Classification Neural Networks ”, Neekhara et al 2018

Adversarial Reprogramming of Text Classification Neural Networks

“Object Hallucination in Image Captioning ”, Rohrbach et al 2018

Object Hallucination in Image Captioning

“This Time With Feeling: Learning Expressive Musical Performance ”, Oore et al 2018

This Time with Feeling: Learning Expressive Musical Performance

“Character-Level Language Modeling With Deeper Self-Attention ”, Al-Rfou et al 2018

Character-Level Language Modeling with Deeper Self-Attention

“General Value Function Networks ”, Schlegel et al 2018

General Value Function Networks

“Deep-Speare: A Joint Neural Model of Poetic Language, Meter and Rhyme ”, Lau et al 2018

Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme

“Universal Transformers ”, Dehghani et al 2018

Universal Transformers

“Accurate Uncertainties for Deep Learning Using Calibrated Regression ”, Kuleshov et al 2018

Accurate Uncertainties for Deep Learning Using Calibrated Regression

“The Natural Language Decathlon: Multitask Learning As Question Answering ”, McCann et al 2018

The Natural Language Decathlon: Multitask Learning as Question Answering

“Neural Ordinary Differential Equations ”, Chen et al 2018

Neural Ordinary Differential Equations

“Know What You Don’t Know: Unanswerable Questions for SQuAD ”, Rajpurkar et al 2018

Know What You Don’t Know: Unanswerable Questions for SQuAD

“DVRL: Deep Variational Reinforcement Learning for POMDPs ”, Igl et al 2018

DVRL: Deep Variational Reinforcement Learning for POMDPs

“Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data ”, Yang et al 2018

Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data

“Hierarchical Neural Story Generation ”, Fan et al 2018

Hierarchical Neural Story Generation

“Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context ”, Khandelwal et al 2018

Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context

“Newsroom: A Dataset of 1.3 Million Summaries With Diverse Extractive Strategies ”, Grusky et al 2018

Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies

“A Tree Search Algorithm for Sequence Labeling ”, Lao et al 2018

A Tree Search Algorithm for Sequence Labeling

“Large Scale Distributed Neural Network Training through Online Distillation ”, Anil et al 2018

Large scale distributed neural network training through online distillation

“An Analysis of Neural Language Modeling at Multiple Scales ”, Merity et al 2018

An Analysis of Neural Language Modeling at Multiple Scales

“Reviving and Improving Recurrent Back-Propagation ”, Liao et al 2018

Reviving and Improving Recurrent Back-Propagation

“Learning Memory Access Patterns ”, Hashemi et al 2018

Learning Memory Access Patterns

“Learning Longer-Term Dependencies in RNNs With Auxiliary Losses ”, Trinh et al 2018

Learning Longer-term Dependencies in RNNs with Auxiliary Losses

“One Big Net For Everything ”, Schmidhuber 2018

One Big Net For Everything

“Efficient Neural Audio Synthesis ”, Kalchbrenner et al 2018

Efficient Neural Audio Synthesis

“Deep Contextualized Word Representations ”, Peters et al 2018

Deep contextualized word representations

“M-Walk: Learning to Walk over Graphs Using Monte Carlo Tree Search ”, Shen et al 2018

M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search

“Overcoming the Vanishing Gradient Problem in Plain Recurrent Networks ”, Hu et al 2018

Overcoming the vanishing gradient problem in plain recurrent networks

“ULMFiT: Universal Language Model Fine-Tuning for Text Classification ”, Howard & Ruder 2018

ULMFiT: Universal Language Model Fine-tuning for Text Classification

“The Compute and Data Moats Are Dead ”, Merity 2018

The compute and data moats are dead

“Large-Scale Comparison of Machine Learning Methods for Drug Target Prediction on ChEMBL ”, Mayr et al 2018

Large-scale comparison of machine learning methods for drug target prediction on ChEMBL

“A Flexible Approach to Automated RNN Architecture Generation ”, Schrimpf et al 2017

A Flexible Approach to Automated RNN Architecture Generation

“The NarrativeQA Reading Comprehension Challenge ”, Kočiský et al 2017

The NarrativeQA Reading Comprehension Challenge

“Learning Compact Recurrent Neural Networks With Block-Term Tensor Decomposition ”, Ye et al 2017

Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition

“LBA: Learning by Asking Questions ”, Misra et al 2017

LBA: Learning by Asking Questions

“Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent ”, Yang et al 2017

Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

“Evaluating Prose Style Transfer With the Bible ”, Carlson et al 2017

Evaluating prose style transfer with the Bible

“Breaking the Softmax Bottleneck: A High-Rank RNN Language Model ”, Yang et al 2017

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

“Neural Speed Reading via Skim-RNN ”, Seo et al 2017

Neural Speed Reading via Skim-RNN

“Unsupervised Machine Translation Using Monolingual Corpora Only ”, Lample et al 2017

Unsupervised Machine Translation Using Monolingual Corpora Only

“Generalization without Systematicity: On the Compositional Skills of Sequence-To-Sequence Recurrent Networks ”, Lake & Baroni 2017

Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks

“Mixed Precision Training ”, Micikevicius et al 2017

Mixed Precision Training

“To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression ”, Zhu & Gupta 2017

To prune, or not to prune: exploring the efficacy of pruning for model compression

“Dynamic Evaluation of Neural Sequence Models ”, Krause et al 2017

Dynamic Evaluation of Neural Sequence Models

“Online Learning of a Memory for Learning Rates ”, Meier et al 2017

Online Learning of a Memory for Learning Rates

“Why Pay More When You Can Pay Less: A Joint Learning Framework for Active Feature Acquisition and Classification ”, Shim et al 2017

Why Pay More When You Can Pay Less: A Joint Learning Framework for Active Feature Acquisition and Classification

“N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning ”, Ashok et al 2017

N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning

“SRU: Simple Recurrent Units for Highly Parallelizable Recurrence ”, Lei et al 2017

SRU: Simple Recurrent Units for Highly Parallelizable Recurrence

“Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks ”, Jayaraman & Grauman 2017

Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks

“Twin Networks: Matching the Future for Sequence Generation ”, Serdyuk et al 2017

Twin Networks: Matching the Future for Sequence Generation

“Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks ”, Campos et al 2017

Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

“Regularizing and Optimizing LSTM Language Models ”, Merity et al 2017

Regularizing and Optimizing LSTM Language Models

“Revisiting Activation Regularization for Language RNNs ”, Merity et al 2017

Revisiting Activation Regularization for Language RNNs

“Bayesian Sparsification of Recurrent Neural Networks ”, Lobacheva et al 2017

Bayesian Sparsification of Recurrent Neural Networks

“On the State-Of-The-Art of Evaluation in Neural Language Models ”, Melis et al 2017

On the State-of-the-Art of Evaluation in Neural Language Models

“Controlling Linguistic Style Aspects in Neural Language Generation ”, Ficler & Goldberg 2017

Controlling Linguistic Style Aspects in Neural Language Generation

“Device Placement Optimization With Reinforcement Learning ”, Mirhoseini et al 2017

Device Placement Optimization with Reinforcement Learning

“Six Challenges for Neural Machine Translation ”, Koehn & Knowles 2017

Six Challenges for Neural Machine Translation

“Towards Synthesizing Complex Programs from Input-Output Examples ”, Chen et al 2017

Towards Synthesizing Complex Programs from Input-Output Examples

“Language Generation With Recurrent Generative Adversarial Networks without Pre-Training ”, Press et al 2017

Language Generation with Recurrent Generative Adversarial Networks without Pre-training

“Biased Importance Sampling for Deep Neural Network Training ”, Katharopoulos & Fleuret 2017

Biased Importance Sampling for Deep Neural Network Training

“Deriving Neural Architectures from Sequence and Graph Kernels ”, Lei et al 2017

Deriving Neural Architectures from Sequence and Graph Kernels

“A Deep Reinforced Model for Abstractive Summarization ”, Paulus et al 2017

A Deep Reinforced Model for Abstractive Summarization

“TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension ”, Joshi et al 2017

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

“DeepTingle ”, Khalifa et al 2017

“A Neural Network System for Transformation of Regional Cuisine Style ”, Kazama et al 2017

A neural network system for transformation of regional cuisine style

“Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU ”, Devlin 2017

Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU

“Adversarial Neural Machine Translation ”, Wu et al 2017

Adversarial Neural Machine Translation

“SearchQA: A New Q&A Dataset Augmented With Context from a Search Engine ”, Dunn et al 2017

SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine

“Learning to Reason: End-To-End Module Networks for Visual Question Answering ”, Hu et al 2017

Learning to Reason: End-to-End Module Networks for Visual Question Answering

“Exploring Sparsity in Recurrent Neural Networks ”, Narang et al 2017

Exploring Sparsity in Recurrent Neural Networks

“Get To The Point: Summarization With Pointer-Generator Networks ”, See et al 2017

Get To The Point: Summarization with Pointer-Generator Networks

“DeepAR: Probabilistic Forecasting With Autoregressive Recurrent Networks ”, Salinas et al 2017

DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks

“Bayesian Recurrent Neural Networks ”, Fortunato et al 2017

Bayesian Recurrent Neural Networks

“Recurrent Environment Simulators ”, Chiappa et al 2017

Recurrent Environment Simulators

“Learning to Generate Reviews and Discovering Sentiment ”, Radford et al 2017

Learning to Generate Reviews and Discovering Sentiment

“Learning Simpler Language Models With the Differential State Framework ”, II et al 2017

Learning Simpler Language Models with the Differential State Framework

“I2T2I: Learning Text to Image Synthesis With Textual Data Augmentation ”, Dong et al 2017

I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation

“Improving Neural Machine Translation With Conditional Sequence Generative Adversarial Nets ”, Yang et al 2017

Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets

“Learned Optimizers That Scale and Generalize ”, Wichrowska et al 2017

Learned Optimizers that Scale and Generalize

“Parallel Multiscale Autoregressive Density Estimation ”, Reed et al 2017

Parallel Multiscale Autoregressive Density Estimation

“Tracking the World State With Recurrent Entity Networks ”, Henaff et al 2017

Tracking the World State with Recurrent Entity Networks

“Optimization As a Model for Few-Shot Learning ”, Ravi & Larochelle 2017

Optimization as a Model for Few-Shot Learning

“Neural Combinatorial Optimization With Reinforcement Learning ”, Bello et al 2017

Neural Combinatorial Optimization with Reinforcement Learning

“Frustratingly Short Attention Spans in Neural Language Modeling ”, Daniluk et al 2017

Frustratingly Short Attention Spans in Neural Language Modeling

“Tuning Recurrent Neural Networks With Reinforcement Learning ”, Jaques et al 2017

Tuning Recurrent Neural Networks with Reinforcement Learning

“Outrageously Large Neural Networks: The Sparsely-Gated Mixture-Of-Experts Layer ”, Shazeer et al 2017

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

“Neural Data Filter for Bootstrapping Stochastic Gradient Descent ”, Fan et al 2017

Neural Data Filter for Bootstrapping Stochastic Gradient Descent

“Learning the Enigma With Recurrent Neural Networks ”, Greydanus 2017

Learning the Enigma with Recurrent Neural Networks :

View External Link:

https://greydanus.github.io/2017/01/07/enigma-rnn/

“Your TL;DR by an AI: A Deep Reinforced Model for Abstractive Summarization ”, Paulus 2017

Your TL;DR by an AI: A Deep Reinforced Model for Abstractive Summarization

“SampleRNN: An Unconditional End-To-End Neural Audio Generation Model ”, Mehri et al 2016

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

“Improving Neural Language Models With a Continuous Cache ”, Grave et al 2016

Improving Neural Language Models with a Continuous Cache

“NewsQA: A Machine Comprehension Dataset ”, Trischler et al 2016

NewsQA: A Machine Comprehension Dataset

“Neural Combinatorial Optimization With Reinforcement Learning ”, Bello et al 2016

Neural Combinatorial Optimization with Reinforcement Learning

“Visual Dialog ”, Das et al 2016

Visual Dialog

“Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation ”, Johnson et al 2016

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

“Learning to Learn without Gradient Descent by Gradient Descent ”, Chen et al 2016

Learning to Learn without Gradient Descent by Gradient Descent

“RL²: Fast Reinforcement Learning via Slow Reinforcement Learning ”, Duan et al 2016

RL²: Fast Reinforcement Learning via Slow Reinforcement Learning

“DeepCoder: Learning to Write Programs ”, Balog et al 2016

DeepCoder: Learning to Write Programs

“QRNNs: Quasi-Recurrent Neural Networks ”, Bradbury et al 2016

QRNNs: Quasi-Recurrent Neural Networks

“Neural Architecture Search With Reinforcement Learning ”, Zoph & Le 2016

Neural Architecture Search with Reinforcement Learning

“Bidirectional Attention Flow for Machine Comprehension ”, Seo et al 2016

Bidirectional Attention Flow for Machine Comprehension

“Hybrid Computing Using a Neural Network With Dynamic External Memory ”, Graves et al 2016

Hybrid computing using a neural network with dynamic external memory

“Scaling Memory-Augmented Neural Networks With Sparse Reads and Writes ”, Rae et al 2016

Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes

“Using Fast Weights to Attend to the Recent Past ”, Ba et al 2016

Using Fast Weights to Attend to the Recent Past

“Achieving Human Parity in Conversational Speech Recognition ”, Xiong et al 2016

Achieving Human Parity in Conversational Speech Recognition

“VPN: Video Pixel Networks ”, Kalchbrenner et al 2016

VPN: Video Pixel Networks

“HyperNetworks ”, Ha et al 2016

HyperNetworks

“Pointer Sentinel Mixture Models ”, Merity et al 2016

Pointer Sentinel Mixture Models

“Multiplicative LSTM for Sequence Modeling ”, Krause et al 2016

Multiplicative LSTM for sequence modeling

“Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation ”, Wu et al 2016

Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

“One Sentence One Model for Neural Machine Translation ”, Li et al 2016

One Sentence One Model for Neural Machine Translation

“Image-To-Markup Generation With Coarse-To-Fine Attention ”, Deng et al 2016

Image-to-Markup Generation with Coarse-to-Fine Attention

“Hierarchical Multiscale Recurrent Neural Networks ”, Chung et al 2016

Hierarchical Multiscale Recurrent Neural Networks

“Deep Learning Human Mind for Automated Visual Classification ”, Spampinato et al 2016

Deep Learning Human Mind for Automated Visual Classification

“Using the Output Embedding to Improve Language Models ”, Press & Wolf 2016

Using the Output Embedding to Improve Language Models

“Full Resolution Image Compression With Recurrent Neural Networks ”, Toderici et al 2016

Full Resolution Image Compression with Recurrent Neural Networks

“Decoupled Neural Interfaces Using Synthetic Gradients ”, Jaderberg et al 2016

Decoupled Neural Interfaces using Synthetic Gradients

“Clockwork Convnets for Video Semantic Segmentation ”, Shelhamer et al 2016

Clockwork Convnets for Video Semantic Segmentation

“Layer Normalization ”, Ba et al 2016

Layer Normalization

“Sequence-Level Knowledge Distillation ”, Kim & Rush 2016

Sequence-Level Knowledge Distillation

“LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks ”, Strobelt et al 2016

LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks

“Learning to Learn by Gradient Descent by Gradient Descent ”, Andrychowicz et al 2016

Learning to learn by gradient descent by gradient descent

“Iterative Alternating Neural Attention for Machine Reading ”, Sordoni et al 2016

Iterative Alternating Neural Attention for Machine Reading

“Deep Reinforcement Learning for Dialogue Generation ”, Li et al 2016

Deep Reinforcement Learning for Dialogue Generation

“Programming With a Differentiable Forth Interpreter ”, Bošnjak et al 2016

Programming with a Differentiable Forth Interpreter

“Training Deep Nets With Sublinear Memory Cost ”, Chen et al 2016

Training Deep Nets with Sublinear Memory Cost

“Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex ”, Liao & Poggio 2016

Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex

“Improving Sentence Compression by Learning to Predict Gaze ”, Klerke et al 2016

Improving sentence compression by learning to predict gaze

“Adaptive Computation Time for Recurrent Neural Networks ”, Graves 2016

Adaptive Computation Time for Recurrent Neural Networks

“Dynamic Memory Networks for Visual and Textual Question Answering ”, Xiong et al 2016

Dynamic Memory Networks for Visual and Textual Question Answering

“PlaNet—Photo Geolocation With Convolutional Neural Networks ”, Weyand et al 2016

PlaNet—Photo Geolocation with Convolutional Neural Networks

“Learning Distributed Representations of Sentences from Unlabeled Data ”, Hill et al 2016

Learning Distributed Representations of Sentences from Unlabeled Data

“Exploring the Limits of Language Modeling ”, Jozefowicz et al 2016

Exploring the Limits of Language Modeling

“PixelRNN: Pixel Recurrent Neural Networks ”, Oord et al 2016

PixelRNN: Pixel Recurrent Neural Networks

“Persistent RNNs: Stashing Recurrent Weights On-Chip ”, Diamos et al 2016

Persistent RNNs: Stashing Recurrent Weights On-Chip

“Exploring the Limits of Language Modeling § 5.9: Samples from the Model ”

Exploring the Limits of Language Modeling § 5.9: Samples from the model :

View PDF:

/doc/www/arxiv.org/ec477c75170386e6fd6aff677d064cf95eb87a20.pdf#page=8&org=google

“Deep-Spying: Spying Using Smartwatch and Deep Learning ”, Beltramelli & Risi 2015

Deep-Spying: Spying using Smartwatch and Deep Learning

“Deep Speech 2: End-To-End Speech Recognition in English and Mandarin ”, Amodei et al 2015

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

“On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models ”, Schmidhuber 2015

On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models

“Neural GPUs Learn Algorithms ”, Kaiser & Sutskever 2015

Neural GPUs Learn Algorithms

“Sequence Level Training With Recurrent Neural Networks ”, Ranzato et al 2015

Sequence Level Training with Recurrent Neural Networks

“Neural Programmer-Interpreters ”, Reed & Freitas 2015

Neural Programmer-Interpreters

“Generating Sentences from a Continuous Space ”, Bowman et al 2015

Generating Sentences from a Continuous Space

“Generative Concatenative Nets Jointly Learn to Write and Classify Reviews ”, Lipton et al 2015

Generative Concatenative Nets Jointly Learn to Write and Classify Reviews

“Generating Images from Captions With Attention ”, Mansimov et al 2015

Generating Images from Captions with Attention

“Semi-Supervised Sequence Learning ”, Dai & Le 2015

Semi-supervised Sequence Learning

“BPEs: Neural Machine Translation of Rare Words With Subword Units ”, Sennrich et al 2015

BPEs: Neural Machine Translation of Rare Words with Subword Units

“Training Recurrent Networks Online without Backtracking ”, Ollivier et al 2015

Training recurrent networks online without backtracking

“Deep Recurrent Q-Learning for Partially Observable MDPs ”, Hausknecht & Stone 2015

Deep Recurrent Q-Learning for Partially Observable MDPs

“A Neural Conversational Model ”, Vinyals & Le 2015

A Neural Conversational Model

“Teaching Machines to Read and Comprehend ”, Hermann et al 2015

Teaching Machines to Read and Comprehend

“Scheduled Sampling for Sequence Prediction With Recurrent Neural Networks ”, Bengio et al 2015

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

“Visualizing and Understanding Recurrent Networks ”, Karpathy et al 2015

Visualizing and Understanding Recurrent Networks

“The Unreasonable Effectiveness of Recurrent Neural Networks ”, Karpathy 2015

The Unreasonable Effectiveness of Recurrent Neural Networks

“Deep Neural Networks for Large Vocabulary Handwritten Text Recognition ”, Bluche 2015

Deep Neural Networks for Large Vocabulary Handwritten Text Recognition

“Reinforcement Learning Neural Turing Machines—Revised ”, Zaremba & Sutskever 2015

Reinforcement Learning Neural Turing Machines—Revised

“End-To-End Memory Networks ”, Sukhbaatar et al 2015

End-To-End Memory Networks

“LSTM: A Search Space Odyssey ”, Greff et al 2015

LSTM: A Search Space Odyssey

“Inferring Algorithmic Patterns With Stack-Augmented Recurrent Nets ”, Joulin & Mikolov 2015

Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets

“DRAW: A Recurrent Neural Network For Image Generation ”, Gregor et al 2015

DRAW: A Recurrent Neural Network For Image Generation

“Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews ”, Mesnil et al 2014

Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews

“Deep Speech 1: Scaling up End-To-End Speech Recognition ”, Hannun et al 2014

Deep Speech 1: Scaling up end-to-end speech recognition

“Neural Turing Machines ”, Graves et al 2014

Neural Turing Machines

“Learning to Execute ”, Zaremba & Sutskever 2014

Learning to Execute

“Neural Machine Translation by Jointly Learning to Align and Translate ”, Bahdanau et al 2014

Neural Machine Translation by Jointly Learning to Align and Translate

“Identifying and Attacking the Saddle Point Problem in High-Dimensional Non-Convex Optimization ”, Dauphin et al 2014

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

“GRU: Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation ”, Cho et al 2014

GRU: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

“`doc2vec`: Distributed Representations of Sentences and Documents ”, Le & Mikolov 2014

doc2vec: Distributed Representations of Sentences and Documents

“A Clockwork RNN ”, Koutník et al 2014

A Clockwork RNN

“One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling ”, Chelba et al 2013

One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling

“Generating Sequences With Recurrent Neural Networks ”, Graves 2013

Generating Sequences With Recurrent Neural Networks

“Context Dependent Recurrent Neural Network Language Model ”, Mikolov & Zweig 2012

Context Dependent Recurrent Neural Network Language Model

“On the Difficulty of Training Recurrent Neural Networks ”, Pascanu et al 2012

On the difficulty of training Recurrent Neural Networks

“[Fitness Landscape of a Small RNN] ”, evolvingstuff 2011

[Fitness landscape of a small RNN] :

https://www.youtube.com/watch?v=VbR-qLnu3ng

“Recurrent Neural Network Based Language Model ”, Mikolov et al 2010

Recurrent Neural Network Based Language Model

“Large Language Models in Machine Translation ”, Brants et al 2007

Large Language Models in Machine Translation

“Learning to Learn Using Gradient Descent ”, Hochreiter et al 2001

Learning to Learn Using Gradient Descent

“Paul J. Werbos Interview ”, Werbos 1998

Paul J. Werbos interview :

View PDF:

/doc/ai/nn/rnn/1998-werbos.pdf

“Long Short-Term Memory ”, Hochreiter & Schmidhuber 1997

Long Short-Term Memory

“Flat Minima ”, Hochreiter & Schmidhuber 1997

Flat Minima

“Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity ”, Williams & Zipser 1995

Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity

“A Focused Backpropagation Algorithm for Temporal Pattern Recognition ”, Mozer 1995

A Focused Backpropagation Algorithm for Temporal Pattern Recognition

“Learning Complex, Extended Sequences Using the Principle of History Compression ”, Schmidhuber 1992

Learning Complex, Extended Sequences Using the Principle of History Compression

“Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks ”, Schmidhuber 1992b

Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks

“Untersuchungen Zu Dynamischen Neuronalen Netzen [Studies of Dynamic Neural Networks] ”, Hochreiter 1991

Untersuchungen zu dynamischen neuronalen Netzen [Studies of dynamic neural networks]

“Finding Structure In Time ”, Elman 1990

Finding Structure In Time

“Connectionist Music Composition Based on Melodic, Stylistic, and Psychophysical Constraints [Technical Report CU-CS-495-90] ”, Mozer 1990

Connectionist Music Composition Based on Melodic, Stylistic, and Psychophysical Constraints [Technical report CU-CS-495-90]

“A Learning Algorithm for Continually Running Fully Recurrent Neural Networks ”, Williams & Zipser 1989b

A Learning Algorithm for Continually Running Fully Recurrent Neural Networks

“Recurrent Backpropagation and Hopfield Networks ”, Almeida & Neto 1989b

Recurrent Backpropagation and Hopfield Networks

“Backpropagation in Perceptrons With Feedback ”, Almeida 1989

Backpropagation in Perceptrons with Feedback

“A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks ”, Schmidhuber 1989

A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks

“Experimental Analysis of the Real-Time Recurrent Learning Algorithm ”, Williams & Zipser 1989

Experimental Analysis of the Real-time Recurrent Learning Algorithm

“A Sticky-Bit Approach for Learning to Represent State ”, Bachrach 1988

A Sticky-Bit Approach for Learning to Represent State :

View PDF:

/doc/ai/nn/rnn/1988-bachrach.pdf

“Generalization of Backpropagation With Application to a Recurrent Gas Market Model ”, Werbos 1988

Generalization of backpropagation with application to a recurrent gas market model

“Generalization of Back-Propagation to Recurrent Neural Networks ”, Pineda 1987

Generalization of back-propagation to recurrent neural networks

“The Utility Driven Dynamic Error Propagation Network (RTRL) ”, Robinson & Fallside 1987

The Utility Driven Dynamic Error Propagation Network (RTRL)

“A Self-Optimizing, Non-Symmetrical Neural Net for Content Addressable Memory and Pattern Recognition ”, Lapedes & Farber 1986

A self-optimizing, non-symmetrical neural net for content addressable memory and pattern recognition

“Programming a Massively Parallel, Computation Universal System: Static Behavior ”, Lapedes & Farber 1986b

Programming a massively parallel, computation universal system: Static behavior

“Serial Order: A Parallel Distributed Processing Approach ”, Jordan 1986

Serial Order: A Parallel Distributed Processing Approach

“Found in Translation: More Accurate, Fluent Sentences in Google Translate ”

Found in translation: More accurate, fluent sentences in Google Translate :

View External Link:

https://blog.google/products/translate/found-translation-more-accurate-fluent-sentences-google-translate/

“Hypernetworks [Blog] ”, Ha 2025

Hypernetworks [blog]

“Safety-First AI for Autonomous Data Center Cooling and Industrial Control ”

Safety-first AI for autonomous data center cooling and industrial control

“Attention and Augmented Recurrent Neural Networks ”

Attention and Augmented Recurrent Neural Networks

“BlinkDL/RWKV-LM: RWKV Is an RNN With Transformer-Level LLM Performance. It Can Be Directly Trained like a GPT (Parallelizable). So It’s Combining the Best of RNN and Transformer—Great Performance, Fast Inference, Saves VRAM, Fast Training, "Infinite" Ctx_len, and Free Sentence Embedding. ”

BlinkDL/RWKV-LM: RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it’s combining the best of RNN and transformer—great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

“RWKV-CLIP: A Robust Vision-Language Representation Learner ”

RWKV-CLIP: A Robust Vision-Language Representation Learner

“Efficient, Reusable RNNs and LSTMs for Torch ”

Efficient, reusable RNNs and LSTMs for torch

“Updated Training? ”

Updated training?

“Minimaxir/textgenrnn: Easily Train Your Own Text-Generating Neural Network of Any Size and Complexity on Any Text Dataset With a Few Lines of Code. ”

minimaxir/textgenrnn: Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

“Deep Learning for Assisting the Process of Music Composition (Part 3) ”

Deep learning for assisting the process of music composition (part 3) :

View External Link:

https://highnoongmt.wordpress.com/2015/08/13/deep-learning-for-assisting-the-process-of-music-composition-part-3/

“Kaichengalex/YFCC15M ”

Kaichengalex/YFCC15M

“Metalearning or Learning to Learn Since 1987 ”

Metalearning or Learning to Learn Since 1987 :

View HTML:

/doc/www/people.idsia.ch/76c24cf0db4abf4ce2b77d22182272d8e62d1a28.html

“Stream Seaandsailor ”

Stream seaandsailor

“Caglar Gulcehre Homepage ”, Gulcehre 2025

Caglar Gulcehre homepage :

View HTML:

/doc/www/www.caglarg.com/28c7d272ec18d3bb25ef7c5e1714a548d244c466.html

“Composing Music With Recurrent Neural Networks ”

Composing Music With Recurrent Neural Networks :

View HTML:

/doc/www/www.danieldjohnson.com/0108f05c124cfb8b547e784dba32a6a5be813f44.html

“RWKV Language Model ”

RWKV Language Model

Sort By Magic

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

`deep-reinforcement-learning`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`meta-learning`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`language-modeling`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

Wikipedia (5)

Hopfield network
Jürgen Schmidhuber
Long short-term memory
Paul Werbos :

https://en.wikipedia.org/wiki/Paul_Werbos
Recurrent neural network

Miscellaneous

Bibliography

https://arxiv.org/abs/2504.05298#nvidia: “One-Minute Video Generation With Test-Time Training ”, Karan Dalal, Daniel Koceja, Gashon Hussein, Jiarui Xu, Yue Zhao, Youjin Song, Shihao Han, Ka Chun Cheung, Jan Kautz, Carlos Guestrin, Tatsunori Hashimoto, Sanmi Koyejo, Yejin Choi, Yu Sun, Xiaolong Wang

link-bibliography
https://arxiv.org/abs/2503.14456: “RWKV-7 ‘Goose’ With Expressive Dynamic State Evolution ”, Bo Peng, Ruichong Zhang, Daniel Goldstein, Eric Alcaide, Xingjian Du, Haowen Hou, Jiaju Lin, Jiaxing Liu, Janna Lu, William Merrill, Guangyu Song, Kaifeng Tan, Saiteja Utpala, Nathan Wilce, Johan S. Wind, Tianyi Wu, Daniel Wuttke, Christian Zhou-Zheng

link-bibliography
https://arxiv.org/abs/2412.07752: “FlashRNN: Optimizing Traditional RNNs on Modern Hardware ”, Korbinian Pöppel, Maximilian Beck, Sepp Hochreiter

link-bibliography
https://arxiv.org/abs/2412.06464#nvidia: “Gated Delta Networks: Improving Mamba-2 With Delta Rule ”, Songlin Yang, Jan Kautz, Ali Hatamizadeh

link-bibliography
https://arxiv.org/abs/2410.01201: “Were RNNs All We Needed? ”, Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio, Hossein Hajimirsadegh

link-bibliography
https://arxiv.org/abs/2408.15237: “The Mamba in the Llama: Distilling and Accelerating Hybrid Models ”, Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao

link-bibliography
https://arxiv.org/abs/2406.07887: “An Empirical Study of Mamba-Based Language Models ”, Roger Waleffe, Wonmin Byeon, Duncan Riach, Brandon Norick, Vijay Korthikanti, Tri Dao, Albert Gu, Ali Hatamizadeh, Sudhakar Singh, Deepak Narayanan, Garvit Kulshreshtha, Vartika Singh, Jared Casper, Jan Kautz, Mohammad Shoeybi, Bryan Catanzaro

link-bibliography
https://arxiv.org/abs/2406.06973: “RWKV-CLIP: A Robust Vision-Language Representation Learner ”, Tiancheng Gu, Kaicheng Yang, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai, Jiankang Deng

link-bibliography
https://arxiv.org/abs/2405.20233: “Grokfast: Accelerated Grokking by Amplifying Slow Gradients ”, Jaerin Lee, Bong Gyun Kang, Kihoon Kim, Kyoung Mu Lee

link-bibliography
https://arxiv.org/abs/2404.08801#facebook: “Megalodon: Efficient LLM Pretraining and Inference With Unlimited Context Length ”, Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou

link-bibliography
https://arxiv.org/abs/2404.05971#eleutherai: “Does Transformer Interpretability Transfer to RNNs? ”, Gonçalo Paulo, Thomas Marshall, Nora Belrose

link-bibliography
https://arxiv.org/abs/2403.17844: “Mechanistic Design and Scaling of Hybrid Architectures ”, Michael Poli, Armin W. Thomas, Eric Nguyen, Pragaash Ponnusamy, Björn Deiseroth, Kristian Kersting, Taiji Suzuki, Brian Hie, Stefano Ermon, Christopher Ré, Ce Zhang, Stefano Massaroli

link-bibliography
https://arxiv.org/abs/2403.13802: “ZigMa: Zigzag Mamba Diffusion Model ”, Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Fischer, Bjorn Ommer

link-bibliography
https://arxiv.org/abs/2312.04927: “Zoology: Measuring and Improving Recall in Efficient Language Models ”, Simran Arora, Sabri Eyuboglu, Aman Timalsina, Isys Johnson, Michael Poli, James Y. Zou, Atri Rudra, Christopher Ré

link-bibliography
https://arxiv.org/abs/2312.00752: “Mamba: Linear-Time Sequence Modeling With Selective State Spaces ”, Albert Gu, Tri Dao

link-bibliography
https://www.nature.com/articles/s41467-023-42875-2#deepmind: “Learning Few-Shot Imitation As Cultural Transmission ”, Avishkar Bhoopchand, Bethanie Brownfield, Adrian Collister, Agustin Dal Lago, Ashley Edwards, Richard Everett, Alexandre Fréchette, Yanko Gitahy Oliveira, Edward Hughes, Kory W. Mathewson, Piermaria Mendolicchio, Julia Pawar, Miruna Pȋslar, Alex Platonov, Evan Senter, Sukhdeep Singh, Alexander Zacherl, Lei M. Zhang

link-bibliography
https://arxiv.org/abs/2310.02980: “Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors ”, Ido Amos, Jonathan Berant, Ankit Gupta

link-bibliography
https://arxiv.org/abs/2306.00323: “Thought Cloning: Learning to Think While Acting by Imitating Human Thinking ”, Shengran Hu, Jeff Clune

link-bibliography
https://arxiv.org/abs/2305.13048: “RWKV: Reinventing RNNs for the Transformer Era ”, Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Huanqi Cao, Xin Cheng, Michael Chung, Matteo Grella, Kranthi Kiran GV, Xuzheng He, Haowen Hou, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartlomiej Koptyra, Hayden Lau, Krishna Sri Ipsit Mantri, Ferdin, Mom, Atsushi Saito, Xiangru Tang, Bolun Wang, Johan S. Wind, Stansilaw Wozniak, Ruichong Zhang, Zhenyuan Zhang, Qihang Zhao, Peng Zhou, Jian Zhu, Rui-Jie Zhu

link-bibliography
https://arxiv.org/abs/2303.06349#deepmind: “Resurrecting Recurrent Neural Networks for Long Sequences ”, Antonio Orvieto, Samuel L. Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, Soham De

link-bibliography
https://arxiv.org/abs/2302.13939: “SpikeGPT: Generative Pre-Trained Language Model With Spiking Neural Networks ”, Rui-Jie Zhu, Qihang Zhao, Jason K. Eshraghian

link-bibliography
2023-bures.pdf: “Organic Reaction Mechanism Classification Using Machine Learning ”, Jordi Burés, Igor Larrosa

link-bibliography
https://arxiv.org/abs/2212.14052: “Hungry Hungry Hippos: Towards Language Modeling With State Space Models ”, Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré

link-bibliography
https://arxiv.org/abs/2212.10544: “Pretraining Without Attention ”, Junxiong Wang, Jing Nathan Yan, Albert Gu, Alexander M. Rush

link-bibliography
https://arxiv.org/abs/2211.07638: “Legged Locomotion in Challenging Terrains Using Egocentric Vision ”, Ananye Agarwal, Ashish Kumar, Jitendra Malik, Deepak Pathak

link-bibliography
https://arxiv.org/abs/2210.01117: “Omnigrok: Grokking Beyond Algorithmic Data ”, Ziming Liu, Eric J. Michaud, Max Tegmark

link-bibliography
https://arxiv.org/abs/2209.11737: “Semantic Scene Descriptions As an Objective of Human Vision ”, Adrien Doerig, Tim C. Kietzmann, Emily Allen, Yihan Wu, Thomas Naselaris, Kendrick Kay, Ian Charest

link-bibliography
https://arxiv.org/abs/2205.01972: “Sequencer: Deep LSTM for Image Classification ”, Yuki Tatsunami, Masato Taki

link-bibliography
2022-grand.pdf: “Semantic Projection Recovers Rich Human Knowledge of Multiple Object Features from Word Embeddings ”, Gabriel Grand, Idan Asher Blank, Francisco Pereira, Evelina Fedorenko

link-bibliography
https://arxiv.org/abs/2203.07852: “Block-Recurrent Transformers ”, DeLesley Hutchins, Imanol Schlag, Yuhuai Wu, Ethan Dyer, Behnam Neyshabur

link-bibliography
https://arxiv.org/abs/2202.07765#deepmind: “General-Purpose, Long-Context Autoregressive Modeling With Perceiver AR ”, Curtis Hawthorne, Andrew Jaegle, Cătălina Cangea, Sebastian Borgeaud, Charlie Nash, Mateusz Malinowski, Sander Dieleman, Oriol Vinyals, Matthew Botvinick, Ian Simon, Hannah Sheahan, Neil Zeghidour, Jean-Baptiste Alayrac, João Carreira, Jesse Engel

link-bibliography
2022-miki.pdf: “Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild ”, Takahiro Miki, Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, Marco Hutter

link-bibliography
https://arxiv.org/abs/2111.00396: “S4: Efficiently Modeling Long Sequences With Structured State Spaces ”, Albert Gu, Karan Goel, Christopher Ré

link-bibliography
https://elifesciences.org/articles/66039: “A Connectome of the Drosophila Central Complex Reveals Network Motifs Suitable for Flexible Navigation and Context-Dependent Action Selection ”, Brad K. Hulse, Hannah Haberkern, Romain Franconville, Daniel B. Turner-Evans, Shin-ya Takemura, Tanya Wolff, Marcella Noorman, Marisa Dreher, Chuntao Dan, Ruchi Parekh, Ann M. Hermundstad, Gerald M. Rubin, Vivek Jayaraman

link-bibliography
https://arxiv.org/abs/2107.14795#deepmind: “Perceiver IO: A General Architecture for Structured Inputs & Outputs ”, Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira

link-bibliography
https://proceedings.mlr.press/v139/vicol21a.html: “PES: Unbiased Gradient Estimation in Unrolled Computation Graphs With Persistent Evolution Strategies ”, Paul Vicol, Luke Metz, Jascha Sohl-Dickstein

link-bibliography
2021-delul.pdf: “Shelley: A Crowd-Sourced Collaborative Horror Writer ”, Pinar Yanardag Delul, Manuel Cebrian, Iyad Rahwan

link-bibliography
2021-jouppi.pdf: “Ten Lessons From Three Generations Shaped Google’s TPUv4i ”, Norman P. Jouppi, Doe Hyun Yoon, Matthew Ashcraft, Mark Gottscho, Thomas B. Jablin, George Kurian, James Laudon, Sheng Li, Peter Ma, Xiaoyu Ma, Thomas Norrie, Nishant Patil, Sushma Prasad, Cliff Young, Zongwei Zhou, David Patterson

link-bibliography
https://arxiv.org/abs/2106.06981: “RASP: Thinking Like Transformers ”, Gail Weiss, Yoav Goldberg, Eran Yahav

link-bibliography
https://arxiv.org/abs/2103.03206#deepmind: “Perceiver: General Perception With Iterative Attention ”, Andrew Jaegle, Felix Gimeno, Andrew Brock, Andrew Zisserman, Oriol Vinyals, Joao Carreira

link-bibliography
https://arxiv.org/abs/2102.04159: “Deep Residual Learning in Spiking Neural Networks ”, Wei Fang, Zhaofei Yu, Yanqi Chen, Tiejun Huang, Timothée Masquelier, Yonghong Tian

link-bibliography
https://arxiv.org/abs/2011.12692#tencent: “Towards Playing Full MOBA Games With Deep Reinforcement Learning ”, Deheng Ye, Guibin Chen, Wen Zhang, Sheng Chen, Bo Yuan, Bo Liu, Jia Chen, Zhao Liu, Fuhao Qiu, Hongsheng Yu, Yinyuting Yin, Bei Shi, Liang Wang, Tengfei Shi, Qiang Fu, Wei Yang, Lanxiao Huang, Wei Liu

link-bibliography
https://arxiv.org/abs/2008.07669: “HiPPO: Recurrent Memory With Optimal Polynomial Projections ”, Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, Christopher Re

link-bibliography
https://www.lesswrong.com/posts/Wnqua6eQkewL3bqsF/matt-botvinick-on-the-spontaneous-emergence-of-learning: “Matt Botvinick on the Spontaneous Emergence of Learning Algorithms ”, Adam Scholl

link-bibliography
https://deepmind.google/discover/blog/agent57-outperforming-the-human-atari-benchmark/: “Agent57: Outperforming the Human Atari Benchmark ”, Adrià Puigdomènech, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Charles Blundell

link-bibliography
https://arxiv.org/abs/2002.03629: “Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving ”, Yang Song, Chenlin Meng, Renjie Liao, Stefano Ermon

link-bibliography
https://arxiv.org/abs/2001.08361#openai: “Scaling Laws for Neural Language Models ”, Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei

link-bibliography
https://arxiv.org/abs/1911.11423: “Single Headed Attention RNN: Stop Thinking With Your Head ”, Stephen Merity

link-bibliography
https://openreview.net/forum?id=HyxlRHBlUB: “Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks ”, Aaron R. Voelker, Ivana Kajić, Chris Eliasmith

link-bibliography
https://arxiv.org/abs/1910.06591#deepmind: “SEED RL: Scalable and Efficient Deep-RL With Accelerated Central Inference ”, Lasse Espeholt, Raphaël Marinier, Piotr Stanczyk, Ke Wang, Marcin Michalski

link-bibliography
https://arxiv.org/abs/1909.01792#deepmind: “Mogrifier LSTM ”, Gábor Melis, Tomáš Kočiský, Phil Blunsom

link-bibliography
https://paperswithcode.com/task/language-modelling: “Language Modeling State-Of-The-Art Leaderboards ”, paperswithcode.com

link-bibliography
https://arxiv.org/abs/1906.08237: “XLNet: Generalized Autoregressive Pretraining for Language Understanding ”, Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le

link-bibliography
https://arxiv.org/abs/1905.01320#deepmind: “Meta-Learners’ Learning Dynamics Are unlike Learners’ ”, Neil C. Rabinowitz

link-bibliography
https://arxiv.org/abs/1901.02860: “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context ”, Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov

link-bibliography
https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html#openai: “Meta-Learning: Learning to Learn Fast ”, Lilian Weng

link-bibliography
https://openreview.net/forum?id=r1lyTjAqYX#deepmind: “R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning ”, Steven Kapturowski, Georg Ostrovski, John Quan, Rémi Munos, Will Dabney

link-bibliography
https://arxiv.org/abs/1807.03819#googledeepmind: “Universal Transformers ”, Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Łukasz Kaiser

link-bibliography
https://arxiv.org/abs/1806.08730#salesforce: “The Natural Language Decathlon: Multitask Learning As Question Answering ”, Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

link-bibliography
https://arxiv.org/abs/1801.06146: “ULMFiT: Universal Language Model Fine-Tuning for Text Classification ”, Jeremy Howard, Sebastian Ruder

link-bibliography
https://smerity.com/articles/2018/limited_compute.html: “The Compute and Data Moats Are Dead ”, Stephen Merity

link-bibliography
https://arxiv.org/abs/1709.07432: “Dynamic Evaluation of Neural Sequence Models ”, Ben Krause, Emmanuel Kahembwe, Iain Murray, Steve Renals

link-bibliography
https://arxiv.org/abs/1709.02755: “SRU: Simple Recurrent Units for Highly Parallelizable Recurrence ”, Tao Lei, Yu Zhang, Sida I. Wang, Hui Dai, Yoav Artzi

link-bibliography
https://arxiv.org/abs/1704.05179: “SearchQA: A New Q&A Dataset Augmented With Context from a Search Engine ”, Matthew Dunn, Levent Sagun, Mike Higgins, V. Ugur Guney, Volkan Cirik, Kyunghyun Cho

link-bibliography
https://arxiv.org/abs/1704.05526: “Learning to Reason: End-To-End Module Networks for Visual Question Answering ”, Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Kate Saenko

link-bibliography
https://arxiv.org/abs/1609.09106#google: “HyperNetworks ”, David Ha, Andrew Dai, Quoc V. Le

link-bibliography
https://arxiv.org/abs/1608.03609: “Clockwork Convnets for Video Semantic Segmentation ”, Evan Shelhamer, Kate Rakelly, Judy Hoffman, Trevor Darrell

link-bibliography
https://arxiv.org/abs/1512.02595#baidu: “Deep Speech 2: End-To-End Speech Recognition in English and Mandarin ”, Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, Zhenyao Zhu

link-bibliography
https://arxiv.org/abs/1503.08895: “End-To-End Memory Networks ”, Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus

link-bibliography
2012-mikolov.pdf: “Context Dependent Recurrent Neural Network Language Model ”, Tomas Mikolov, Geoffrey Zweig

link-bibliography
2010-mikolov.pdf: “Recurrent Neural Network Based Language Model ”, Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur

link-bibliography
1991-hochreiter.pdf: “Untersuchungen Zu Dynamischen Neuronalen Netzen [Studies of Dynamic Neural Networks] ”, Sepp Hochreiter

link-bibliography
1989-williams.pdf: “Experimental Analysis of the Real-Time Recurrent Learning Algorithm ”, Ronald J. Williams, David Zipser

link-bibliography
1988-werbos.pdf: “Generalization of Backpropagation With Application to a Recurrent Gas Market Model ”, Paul Joseph Werbos

link-bibliography
1987-pineda.pdf: “Generalization of Back-Propagation to Recurrent Neural Networks ”, Fernando J. Pineda

link-bibliography

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]