- See Also
-
Links
- “RWKV: Reinventing RNNs for the Transformer Era”, Et Al 2023
- “Resurrecting Recurrent Neural Networks for Long Sequences”, Et Al 2023
- “SpikeGPT: Generative Pre-trained Language Model With Spiking Neural Networks”, Et Al 2023
- “Organic Reaction Mechanism Classification Using Machine Learning”, 2023
- “A High-performance Speech Neuroprosthesis”, Et Al 2023
- “Hungry Hungry Hippos: Towards Language Modeling With State Space Models”, Et Al 2022
- “Melting Pot 2.0”, Et Al 2022
- “VeLO: Training Versatile Learned Optimizers by Scaling Up”, Et Al 2022
- “Legged Locomotion in Challenging Terrains Using Egocentric Vision”, Et Al 2022
- “Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities”, Et Al 2022
- “Semantic Scene Descriptions As an Objective of Human Vision”, Et Al 2022
- “Benchmarking Compositionality With Formal Languages”, Et Al 2022
- “PI-ARS: Accelerating Evolution-Learned Visual-Locomotion With Predictive Information Representations”, Et Al 2022
- “Spatial Representation by Ramping Activity of Neurons in the Retrohippocampal Cortex”, Et Al 2022
- “Neural Networks and the Chomsky Hierarchy”, Et Al 2022
- “BYOL-Explore: Exploration by Bootstrapped Prediction”, Et Al 2022
- “AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos”, Et Al 2022
- “Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline (3RL)”, Et Al 2022
- “Simple Recurrence Improves Masked Language Models”, Et Al 2022
- “Sequencer: Deep LSTM for Image Classification”, 2022
- “Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers”, Et Al 2022
- “Block-Recurrent Transformers”, Et Al 2022
- “Learning by Directional Gradient Descent”, Et Al 2022
- “Retrieval-Augmented Reinforcement Learning”, Et Al 2022
- “General-purpose, Long-context Autoregressive Modeling With Perceiver AR”, Et Al 2022
- “End-to-end Algorithm Synthesis With Recurrent Networks: Logical Extrapolation Without Overthinking”, Et Al 2022
- “Data Scaling Laws in NMT: The Effect of Noise and Architecture”, Et Al 2022
- “Active Predictive Coding Networks: A Neural Solution to the Problem of Learning Reference Frames and Part-Whole Hierarchies”, 2022
- “Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild”, Et Al 2022
- “Inducing Causal Structure for Interpretable Neural Networks (IIT)”, Et Al 2021
- “Evaluating Distributional Distortion in Neural Language Modeling”, 2021
- “Gradients Are Not All You Need”, Et Al 2021
- “An Explanation of In-context Learning As Implicit Bayesian Inference”, Et Al 2021
- “Minimum Description Length Recurrent Neural Networks”, Et Al 2021
- “S4: Efficiently Modeling Long Sequences With Structured State Spaces”, Et Al 2021
- “A Connectome of The Drosophila Central Complex Reveals Network Motifs Suitable for Flexible Navigation and Context-dependent Action Selection”, Et Al 2021
- “LSSL: Combining Recurrent, Convolutional, and Continuous-time Models With Linear State-Space Layers”, Et Al 2021
- “Recurrent Model-Free RL Is a Strong Baseline for Many POMDPs”, Et Al 2021
- “Photos Are All You Need for Reciprocal Recommendation in Online Dating”, Neve & 2021
- “Perceiver IO: A General Architecture for Structured Inputs & Outputs”, Et Al 2021
- “Unbiased Gradient Estimation in Unrolled Computation Graphs With Persistent Evolution Strategies”, Et Al 2021
- “Shelley: A Crowd-sourced Collaborative Horror Writer”, Et Al 2021
- “Ten Lessons From Three Generations Shaped Google’s TPUv4i”, Et Al 2021
- “RASP: Thinking Like Transformers”, Et Al 2021
- “Scaling Laws for Acoustic Models”, 2021
- “Scaling End-to-End Models for Large-Scale Multilingual ASR”, Et Al 2021
- “Efficient Transformers in Reinforcement Learning Using Actor-Learner Distillation”, 2021
- “Finetuning Pretrained Transformers into RNNs”, Et Al 2021
- “Pretrained Transformers As Universal Computation Engines”, Et Al 2021
- “Perceiver: General Perception With Iterative Attention”, Et Al 2021
- “When Attention Meets Fast Recurrence: Training SRU++ Language Models With Reduced Compute”, 2021
- “Predictive Coding Is a Consequence of Energy Efficiency in Recurrent Neural Networks”, Et Al 2021
- “Deep Residual Learning in Spiking Neural Networks”, Et Al 2021
- “Distilling Large Language Models into Tiny and Effective Students Using PQRNN”, Et Al 2021
- “Meta Learning Backpropagation And Improving It”, 2020
- “On the Binding Problem in Artificial Neural Networks”, Et Al 2020
- “A Recurrent Vision-and-Language BERT for Navigation”, Et Al 2020
- “Towards Playing Full MOBA Games With Deep Reinforcement Learning”, Et Al 2020
- “Adversarial Vulnerabilities of Human Decision-making”, Et Al 2020
- “Learning to Summarize Long Texts With Memory Compression and Transfer”, Et Al 2020
- “Human-centric Dialog Training via Offline Reinforcement Learning”, Et Al 2020
- “AFT: An Attention Free Transformer”, 2020
- “Deep Reinforcement Learning for Closed-Loop Blood Glucose Control”, Et Al 2020
- “HiPPO: Recurrent Memory With Optimal Polynomial Projections”, Et Al 2020
- “Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size”, Et Al 2020
- “Matt Botvinick on the Spontaneous Emergence of Learning Algorithms”, 2020
- “DeepSinger: Singing Voice Synthesis With Data Mined From the Web”, Et Al 2020
- “High-performance Brain-to-text Communication via Imagined Handwriting”, Et Al 2020
- “Transformers Are RNNs: Fast Autoregressive Transformers With Linear Attention”, Et Al 2020
- “The Recurrent Neural Tangent Kernel”, Et Al 2020
- “Untangling Tradeoffs between Recurrence and Self-attention in Neural Networks”, Et Al 2020
- “Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing”, Et Al 2020
- “Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models”, 2020
- “Syntactic Structure from Deep Learning”, 2020
- “Agent57: Outperforming the Human Atari Benchmark”, Et Al 2020
- “Machine Translation of Cortical Activity to Text With an Encoder-decoder Framework”, Et Al 2020
- “Learning-based Memory Allocation for C++ Server Workloads”, Et Al 2020
- “Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving”, Et Al 2020
- “Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks”, Et Al 2020
- “Scaling Laws for Neural Language Models”, Et Al 2020
- “Placing Language in an Integrated Understanding System: Next Steps toward Human-level Performance in Neural Language Models”, Et Al 2020
- “Estimating the Deep Replicability of Scientific Findings Using Human and Artificial Intelligence”, Et Al 2020
- “Single Headed Attention RNN: Stop Thinking With Your Head”, 2019
- “Excavate”, 2019
- “MuZero: Mastering Atari, Go, Chess and Shogi by Planning With a Learned Model”, Et Al 2019
- “CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning”, Et Al 2019
- “Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks”, Et Al 2019
- “High Fidelity Video Prediction With Large Stochastic Recurrent Neural Networks”, Et Al 2019
- “SEED RL: Scalable and Efficient Deep-RL With Accelerated Central Inference”, Et Al 2019
- “Mixed-Signal Neuromorphic Processors: Quo Vadis?”, Et Al 2019
- “R2D3: Making Efficient Use of Demonstrations to Solve Hard Exploration Problems”, Et Al 2019
- “Language Modelling State-of-the-art Leaderboards”, Paperswithcode.com 2019
- “Metalearned Neural Memory”, Et Al 2019
- “Generating Text With Recurrent Neural Networks”, Et Al 2019
- “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank”, Et Al 2019
- “XLNet: Generalized Autoregressive Pretraining for Language Understanding”, Et Al 2019
- “Playing the Lottery With Rewards and Multiple Languages: Lottery Tickets in RL and NLP”, Et Al 2019
- “Reinforcement Learning, Fast and Slow”, Et Al 2019
- “MoGlow: Probabilistic and Controllable Motion Synthesis Using Normalizing Flows”, Et Al 2019
- “Meta-learners’ Learning Dynamics Are unlike Learners’”, 2019
- “Speech Synthesis from Neural Decoding of Spoken Sentences”, Et Al 2019
- “Good News, Everyone! Context Driven Entity-aware Captioning for News Images”, Et Al 2019
- “Surrogate Gradient Learning in Spiking Neural Networks”, Et Al 2019
- “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”, Et Al 2019
- “Natural Questions: A Benchmark for Question Answering Research”, Et Al 2019
- “High Fidelity Video Prediction With Large Stochastic Recurrent Neural Networks: Videos”, Et Al 2019
- “Bayesian Layers: A Module for Neural Network Uncertainty”, Et Al 2018
- “Meta-Learning: Learning to Learn Fast”, 2018
- “Piano Genie”, Et Al 2018
- “Learning Recurrent Binary/Ternary Weights”, Et Al 2018
- “R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning”, Et Al 2018
- “HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering”, Et Al 2018
- “Adversarial Reprogramming of Text Classification Neural Networks”, Et Al 2018
- “This Time With Feeling: Learning Expressive Musical Performance”, Et Al 2018
- “Character-Level Language Modeling With Deeper Self-Attention”, Al-Et Al 2018
- “General Value Function Networks”, Et Al 2018
- “Universal Transformers”, Et Al 2018
- “Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme”, Et Al 2018
- “Accurate Uncertainties for Deep Learning Using Calibrated Regression”, Et Al 2018
- “Neural Ordinary Differential Equations”, Et Al 2018
- “Know What You Don’t Know: Unanswerable Questions for SQuAD”, Et Al 2018
- “Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data”, Et Al 2018
- “Hierarchical Neural Story Generation”, Et Al 2018
- “Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context”, Et Al 2018
- “Newsroom: A Dataset of 1.3 Million Summaries With Diverse Extractive Strategies”, Et Al 2018
- “A Tree Search Algorithm for Sequence Labeling”, Et Al 2018
- “An Analysis of Neural Language Modeling at Multiple Scales”, Et Al 2018
- “Reviving and Improving Recurrent Back-Propagation”, Et Al 2018
- “Learning Memory Access Patterns”, Et Al 2018
- “Learning Longer-term Dependencies in RNNs With Auxiliary Losses”, Et Al 2018
- “One Big Net For Everything”, 2018
- “Efficient Neural Audio Synthesis”, Et Al 2018
- “Deep Contextualized Word Representations”, Et Al 2018
- “M-Walk: Learning to Walk over Graphs Using Monte Carlo Tree Search”, Et Al 2018
- “ULMFiT: Universal Language Model Fine-tuning for Text Classification”, 2018
- “Large-scale Comparison of Machine Learning Methods for Drug Target Prediction on ChEMBL”, Et Al 2018
- “A Flexible Approach to Automated RNN Architecture Generation”, Et Al 2017
- “Learning Compact Recurrent Neural Networks With Block-Term Tensor Decomposition”, Et Al 2017
- “Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent”, Et Al 2017
- “Evaluating Prose Style Transfer With the Bible”, Et Al 2017
- “Breaking the Softmax Bottleneck: A High-Rank RNN Language Model”, Et Al 2017
- “Neural Speed Reading via Skim-RNN”, Et Al 2017
- “Generalization without Systematicity: On the Compositional Skills of Sequence-to-sequence Recurrent Networks”, 2017
- “Mixed Precision Training”, Et Al 2017
- “To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression”, 2017
- “N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning”, Et Al 2017
- “Why Pay More When You Can Pay Less: A Joint Learning Framework for Active Feature Acquisition and Classification”, Et Al 2017
- “SRU: Simple Recurrent Units for Highly Parallelizable Recurrence”, Et Al 2017
- “Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks”, 2017
- “Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks”, Et Al 2017
- “Twin Networks: Matching the Future for Sequence Generation”, Et Al 2017
- “Revisiting Activation Regularization for Language RNNs”, Et Al 2017
- “Bayesian Sparsification of Recurrent Neural Networks”, Et Al 2017
- “On the State-of-the-Art of Evaluation in Neural Language Models”, Et Al 2017
- “Controlling Linguistic Style Aspects in Neural Language Generation”, 2017
- “Device Placement Optimization With Reinforcement Learning”, Et Al 2017
- “Language Generation With Recurrent Generative Adversarial Networks without Pre-training”, Et Al 2017
- “Biased Importance Sampling for Deep Neural Network Training”, 2017
- “Deriving Neural Architectures from Sequence and Graph Kernels”, Et Al 2017
- “A Deep Reinforced Model for Abstractive Summarization”, Et Al 2017
- “DeepTingle”, Et Al 2017
- “TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension”, Et Al 2017
- “A Neural Network System for Transformation of Regional Cuisine Style”, Et Al 2017
- “Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU”, 2017
- “Adversarial Neural Machine Translation”, Et Al 2017
- “Learning to Reason: End-to-End Module Networks for Visual Question Answering”, Et Al 2017
- “SearchQA: A New Q&A Dataset Augmented With Context from a Search Engine”, Et Al 2017
- “Exploring Sparsity in Recurrent Neural Networks”, Et Al 2017
- “DeepAR: Probabilistic Forecasting With Autoregressive Recurrent Networks”, Et Al 2017
- “Recurrent Environment Simulators”, Et Al 2017
- “Learning to Generate Reviews and Discovering Sentiment”, Et Al 2017
- “I2T2I: Learning Text to Image Synthesis With Textual Data Augmentation”, Et Al 2017
- “Improving Neural Machine Translation With Conditional Sequence Generative Adversarial Nets”, Et Al 2017
- “Learned Optimizers That Scale and Generalize”, Et Al 2017
- “Parallel Multiscale Autoregressive Density Estimation”, Et Al 2017
- “Tracking the World State With Recurrent Entity Networks”, Et Al 2017
- “Optimization As a Model for Few-Shot Learning”, 2017
- “Neural Combinatorial Optimization With Reinforcement Learning”, Et Al 2017
- “Tuning Recurrent Neural Networks With Reinforcement Learning”, Et Al 2017
- “Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer”, Et Al 2017
- “Neural Data Filter for Bootstrapping Stochastic Gradient Descent”, Et Al 2017
- “SampleRNN: An Unconditional End-to-End Neural Audio Generation Model”, Et Al 2016
- “Improving Neural Language Models With a Continuous Cache”, Et Al 2016
- “NewsQA: A Machine Comprehension Dataset”, Et Al 2016
- “Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation”, Et Al 2016
- “Learning to Learn without Gradient Descent by Gradient Descent”, Et Al 2016
- “RL2: Fast Reinforcement Learning via Slow Reinforcement Learning”, Et Al 2016
- “DeepCoder: Learning to Write Programs”, Et Al 2016
- “Bidirectional Attention Flow for Machine Comprehension”, Et Al 2016
- “Neural Architecture Search With Reinforcement Learning”, 2016
- “QRNNs: Quasi-Recurrent Neural Networks”, Et Al 2016
- “Hybrid Computing Using a Neural Network With Dynamic External Memory”, Et Al 2016
- “Using Fast Weights to Attend to the Recent Past”, Et Al 2016
- “Achieving Human Parity in Conversational Speech Recognition”, Et Al 2016
- “HyperNetworks”, Et Al 2016
- “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”, Et Al 2016
- “Pointer Sentinel Mixture Models”, Et Al 2016
- “Deep Learning Human Mind for Automated Visual Classification”, Et Al 2016
- “Decoupled Neural Interfaces Using Synthetic Gradients”, Et Al 2016
- “Full Resolution Image Compression With Recurrent Neural Networks”, Et Al 2016
- “LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks”, Et Al 2016
- “Learning to Learn by Gradient Descent by Gradient Descent”, Et Al 2016
- “Iterative Alternating Neural Attention for Machine Reading”, Et Al 2016
- “Deep Reinforcement Learning for Dialogue Generation”, Et Al 2016
- “Programming With a Differentiable Forth Interpreter”, Et Al 2016
- “Training Deep Nets With Sublinear Memory Cost”, Et Al 2016
- “Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex”, 2016
- “Improving Sentence Compression by Learning to Predict Gaze”, Et Al 2016
- “Adaptive Computation Time for Recurrent Neural Networks”, 2016
- “Dynamic Memory Networks for Visual and Textual Question Answering”, Et Al 2016
- “PlaNet—Photo Geolocation With Convolutional Neural Networks”, Et Al 2016
- “Learning Distributed Representations of Sentences from Unlabeled Data”, Et Al 2016
- “Exploring the Limits of Language Modeling”, Et Al 2016
- “Pixel Recurrent Neural Networks”, Et Al 2016
- “Persistent RNNs: Stashing Recurrent Weights On-Chip”, Et Al 2016
- “Deep-Spying: Spying Using Smartwatch and Deep Learning”, 2015
- “On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models”, 2015
- “Neural GPUs Learn Algorithms”, 2015
- “Sequence Level Training With Recurrent Neural Networks”, Et Al 2015
- “Generating Sentences from a Continuous Space”, Et Al 2015
- “Generative Concatenative Nets Jointly Learn to Write and Classify Reviews”, Et Al 2015
- “Generating Images from Captions With Attention”, Et Al 2015
- “Semi-supervised Sequence Learning”, 2015
- “RNN Metadata for Mimicking Author Style”, Gwern 2015
- “Deep Recurrent Q-Learning for Partially Observable MDPs”, 2015
- “Teaching Machines to Read and Comprehend”, Et Al 2015
- “Scheduled Sampling for Sequence Prediction With Recurrent Neural Networks”, Et Al 2015
- “Visualizing and Understanding Recurrent Networks”, Et Al 2015
- “The Unreasonable Effectiveness of Recurrent Neural Networks”, 2015
- “Deep Neural Networks for Large Vocabulary Handwritten Text Recognition”, 2015
- “Reinforcement Learning Neural Turing Machines—Revised”, 2015
- “End-To-End Memory Networks”, Et Al 2015
- “Inferring Algorithmic Patterns With Stack-Augmented Recurrent Nets”, Et Al 2015
- “DRAW: A Recurrent Neural Network For Image Generation”, Et Al 2015
- “Neural Turing Machines”, Et Al 2014
- “Learning to Execute”, 2014
- “Neural Machine Translation by Jointly Learning to Align and Translate”, Et Al 2014
- “Distributed Representations of Sentences and Documents”, 2014
- “One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling”, Et Al 2013
- “Generating Sequences With Recurrent Neural Networks”, 2013
- “Recurrent Neural Network Based Language Model”, Et Al 2010
- “Large Language Models in Machine Translation”, Et Al 2007
- “Learning to Learn Using Gradient Descent”, Et Al 2001
- “Long Short-Term Memory”, 1997
- “Flat Minima”, 1997
- “Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity”, 1995
- “A Focused Backpropagation Algorithm for Temporal Pattern Recognition”, 1995
- “Learning Complex, Extended Sequences Using the Principle of History Compression”, 1992
- “Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks”, 1992
- “Untersuchungen Zu Dynamischen Neuronalen Netzen [Studies of Dynamic Neural Networks]”, 1991
- “Finding Structure In Time”, 1990
- “Connectionist Music Composition Based on Melodic, Stylistic, and Psychophysical Constraints [Technical Report CU-CS–495–90]”, 1990
- “A Learning Algorithm for Continually Running Fully Recurrent Neural Networks”, 1989b
- “A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks”, 1989
- “Experimental Analysis of the Real-time Recurrent Learning Algorithm”, 1989
- “A Sticky-Bit Approach for Learning to Represent State”, 1988
- “The Utility Driven Dynamic Error Propagation Network (RTRL)”, 1987
- “Serial Order: A Parallel Distributed Processing Approach”, 1986
- “Attention and Augmented Recurrent Neural Networks”
- “Deep Learning for Assisting the Process of Music Composition (part 3)”
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Links
“RWKV: Reinventing RNNs for the Transformer Era”, Et Al 2023
“Resurrecting Recurrent Neural Networks for Long Sequences”, Et Al 2023
“SpikeGPT: Generative Pre-trained Language Model With Spiking Neural Networks”, Et Al 2023
“SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks”
“Organic Reaction Mechanism Classification Using Machine Learning”, 2023
“Organic reaction mechanism classification using machine learning”
“A High-performance Speech Neuroprosthesis”, Et Al 2023
“Hungry Hungry Hippos: Towards Language Modeling With State Space Models”, Et Al 2022
“Hungry Hungry Hippos: Towards Language Modeling with State Space Models”
“Melting Pot 2.0”, Et Al 2022
“VeLO: Training Versatile Learned Optimizers by Scaling Up”, Et Al 2022
“Legged Locomotion in Challenging Terrains Using Egocentric Vision”, Et Al 2022
“Legged Locomotion in Challenging Terrains using Egocentric Vision”
“Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities”, Et Al 2022
“Semantic Scene Descriptions As an Objective of Human Vision”, Et Al 2022
“Semantic scene descriptions as an objective of human vision”
“Benchmarking Compositionality With Formal Languages”, Et Al 2022
“PI-ARS: Accelerating Evolution-Learned Visual-Locomotion With Predictive Information Representations”, Et Al 2022
“Spatial Representation by Ramping Activity of Neurons in the Retrohippocampal Cortex”, Et Al 2022
“Spatial representation by ramping activity of neurons in the retrohippocampal cortex”
“Neural Networks and the Chomsky Hierarchy”, Et Al 2022
“BYOL-Explore: Exploration by Bootstrapped Prediction”, Et Al 2022
“AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos”, Et Al 2022
“AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos”
“Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline (3RL)”, Et Al 2022
“Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline (3RL)”
“Simple Recurrence Improves Masked Language Models”, Et Al 2022
“Sequencer: Deep LSTM for Image Classification”, 2022
“Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers”, Et Al 2022
“Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers”
“Block-Recurrent Transformers”, Et Al 2022
“Learning by Directional Gradient Descent”, Et Al 2022
“Retrieval-Augmented Reinforcement Learning”, Et Al 2022
“General-purpose, Long-context Autoregressive Modeling With Perceiver AR”, Et Al 2022
“General-purpose, long-context autoregressive modeling with Perceiver AR”
“End-to-end Algorithm Synthesis With Recurrent Networks: Logical Extrapolation Without Overthinking”, Et Al 2022
“End-to-end Algorithm Synthesis with Recurrent Networks: Logical Extrapolation Without Overthinking”
“Data Scaling Laws in NMT: The Effect of Noise and Architecture”, Et Al 2022
“Data Scaling Laws in NMT: The Effect of Noise and Architecture”
“Active Predictive Coding Networks: A Neural Solution to the Problem of Learning Reference Frames and Part-Whole Hierarchies”, 2022
“Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild”, Et Al 2022
“Learning robust perceptive locomotion for quadrupedal robots in the wild”
“Inducing Causal Structure for Interpretable Neural Networks (IIT)”, Et Al 2021
“Inducing Causal Structure for Interpretable Neural Networks (IIT)”
“Evaluating Distributional Distortion in Neural Language Modeling”, 2021
“Evaluating Distributional Distortion in Neural Language Modeling”
“Gradients Are Not All You Need”, Et Al 2021
“An Explanation of In-context Learning As Implicit Bayesian Inference”, Et Al 2021
“An Explanation of In-context Learning as Implicit Bayesian Inference”
“Minimum Description Length Recurrent Neural Networks”, Et Al 2021
“S4: Efficiently Modeling Long Sequences With Structured State Spaces”, Et Al 2021
“S4: Efficiently Modeling Long Sequences with Structured State Spaces”
“A Connectome of The Drosophila Central Complex Reveals Network Motifs Suitable for Flexible Navigation and Context-dependent Action Selection”, Et Al 2021
“LSSL: Combining Recurrent, Convolutional, and Continuous-time Models With Linear State-Space Layers”, Et Al 2021
“Recurrent Model-Free RL Is a Strong Baseline for Many POMDPs”, Et Al 2021
“Recurrent Model-Free RL is a Strong Baseline for Many POMDPs”
“Photos Are All You Need for Reciprocal Recommendation in Online Dating”, Neve & 2021
“Photos Are All You Need for Reciprocal Recommendation in Online Dating”
“Perceiver IO: A General Architecture for Structured Inputs & Outputs”, Et Al 2021
“Perceiver IO: A General Architecture for Structured Inputs & Outputs”
“Unbiased Gradient Estimation in Unrolled Computation Graphs With Persistent Evolution Strategies”, Et Al 2021
“Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies”
“Shelley: A Crowd-sourced Collaborative Horror Writer”, Et Al 2021
“Ten Lessons From Three Generations Shaped Google’s TPUv4i”, Et Al 2021
“RASP: Thinking Like Transformers”, Et Al 2021
“Scaling Laws for Acoustic Models”, 2021
“Scaling End-to-End Models for Large-Scale Multilingual ASR”, Et Al 2021
“Scaling End-to-End Models for Large-Scale Multilingual ASR”
“Efficient Transformers in Reinforcement Learning Using Actor-Learner Distillation”, 2021
“Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation”
“Finetuning Pretrained Transformers into RNNs”, Et Al 2021
“Pretrained Transformers As Universal Computation Engines”, Et Al 2021
“Perceiver: General Perception With Iterative Attention”, Et Al 2021
“When Attention Meets Fast Recurrence: Training SRU++ Language Models With Reduced Compute”, 2021
“When Attention Meets Fast Recurrence: Training SRU++ Language Models with Reduced Compute”
“Predictive Coding Is a Consequence of Energy Efficiency in Recurrent Neural Networks”, Et Al 2021
“Predictive coding is a consequence of energy efficiency in recurrent neural networks”
“Deep Residual Learning in Spiking Neural Networks”, Et Al 2021
“Distilling Large Language Models into Tiny and Effective Students Using PQRNN”, Et Al 2021
“Distilling Large Language Models into Tiny and Effective Students using pQRNN”
“Meta Learning Backpropagation And Improving It”, 2020
“On the Binding Problem in Artificial Neural Networks”, Et Al 2020
“A Recurrent Vision-and-Language BERT for Navigation”, Et Al 2020
“Towards Playing Full MOBA Games With Deep Reinforcement Learning”, Et Al 2020
“Towards Playing Full MOBA Games with Deep Reinforcement Learning”
“Adversarial Vulnerabilities of Human Decision-making”, Et Al 2020
“Learning to Summarize Long Texts With Memory Compression and Transfer”, Et Al 2020
“Learning to Summarize Long Texts with Memory Compression and Transfer”
“Human-centric Dialog Training via Offline Reinforcement Learning”, Et Al 2020
“Human-centric Dialog Training via Offline Reinforcement Learning”
“AFT: An Attention Free Transformer”, 2020
“Deep Reinforcement Learning for Closed-Loop Blood Glucose Control”, Et Al 2020
“Deep Reinforcement Learning for Closed-Loop Blood Glucose Control”
“HiPPO: Recurrent Memory With Optimal Polynomial Projections”, Et Al 2020
“HiPPO: Recurrent Memory with Optimal Polynomial Projections”
“Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size”, Et Al 2020
“Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size”
“Matt Botvinick on the Spontaneous Emergence of Learning Algorithms”, 2020
“Matt Botvinick on the spontaneous emergence of learning algorithms”
“DeepSinger: Singing Voice Synthesis With Data Mined From the Web”, Et Al 2020
“DeepSinger: Singing Voice Synthesis with Data Mined From the Web”
“High-performance Brain-to-text Communication via Imagined Handwriting”, Et Al 2020
“High-performance brain-to-text communication via imagined handwriting”
“Transformers Are RNNs: Fast Autoregressive Transformers With Linear Attention”, Et Al 2020
“Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention”
“The Recurrent Neural Tangent Kernel”, Et Al 2020
“Untangling Tradeoffs between Recurrence and Self-attention in Neural Networks”, Et Al 2020
“Untangling tradeoffs between recurrence and self-attention in neural networks”
“Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing”, Et Al 2020
“Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing”
“Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models”, 2020
“Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models”
“Syntactic Structure from Deep Learning”, 2020
“Agent57: Outperforming the Human Atari Benchmark”, Et Al 2020
“Machine Translation of Cortical Activity to Text With an Encoder-decoder Framework”, Et Al 2020
“Machine translation of cortical activity to text with an encoder-decoder framework”
“Learning-based Memory Allocation for C++ Server Workloads”, Et Al 2020
“Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving”, Et Al 2020
“Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving”
“Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks”, Et Al 2020
“Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks”
“Scaling Laws for Neural Language Models”, Et Al 2020
“Placing Language in an Integrated Understanding System: Next Steps toward Human-level Performance in Neural Language Models”, Et Al 2020
“Estimating the Deep Replicability of Scientific Findings Using Human and Artificial Intelligence”, Et Al 2020
“Estimating the deep replicability of scientific findings using human and artificial intelligence”
“Single Headed Attention RNN: Stop Thinking With Your Head”, 2019
“Excavate”, 2019
“MuZero: Mastering Atari, Go, Chess and Shogi by Planning With a Learned Model”, Et Al 2019
“MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model”
“CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning”, Et Al 2019
“CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning”
“Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks”, Et Al 2019
“Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks”
“High Fidelity Video Prediction With Large Stochastic Recurrent Neural Networks”, Et Al 2019
“High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks”
“SEED RL: Scalable and Efficient Deep-RL With Accelerated Central Inference”, Et Al 2019
“SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference”
“Mixed-Signal Neuromorphic Processors: Quo Vadis?”, Et Al 2019
“R2D3: Making Efficient Use of Demonstrations to Solve Hard Exploration Problems”, Et Al 2019
“R2D3: Making Efficient Use of Demonstrations to Solve Hard Exploration Problems”
“Language Modelling State-of-the-art Leaderboards”, Paperswithcode.com 2019
“Metalearned Neural Memory”, Et Al 2019
“Generating Text With Recurrent Neural Networks”, Et Al 2019
“Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank”, Et Al 2019
“Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank”
“XLNet: Generalized Autoregressive Pretraining for Language Understanding”, Et Al 2019
“XLNet: Generalized Autoregressive Pretraining for Language Understanding”
“Playing the Lottery With Rewards and Multiple Languages: Lottery Tickets in RL and NLP”, Et Al 2019
“Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP”
“Reinforcement Learning, Fast and Slow”, Et Al 2019
“MoGlow: Probabilistic and Controllable Motion Synthesis Using Normalizing Flows”, Et Al 2019
“MoGlow: Probabilistic and controllable motion synthesis using normalizing flows”
“Meta-learners’ Learning Dynamics Are unlike Learners’”, 2019
“Speech Synthesis from Neural Decoding of Spoken Sentences”, Et Al 2019
“Good News, Everyone! Context Driven Entity-aware Captioning for News Images”, Et Al 2019
“Good News, Everyone! Context driven entity-aware captioning for news images”
“Surrogate Gradient Learning in Spiking Neural Networks”, Et Al 2019
“Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”, Et Al 2019
“Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”
“Natural Questions: A Benchmark for Question Answering Research”, Et Al 2019
“Natural Questions: A Benchmark for Question Answering Research”
“High Fidelity Video Prediction With Large Stochastic Recurrent Neural Networks: Videos”, Et Al 2019
“High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks: Videos”
“Bayesian Layers: A Module for Neural Network Uncertainty”, Et Al 2018
“Meta-Learning: Learning to Learn Fast”, 2018
“Piano Genie”, Et Al 2018
“Learning Recurrent Binary/Ternary Weights”, Et Al 2018
“R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning”, Et Al 2018
“R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning”
“HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering”, Et Al 2018
“HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering”
“Adversarial Reprogramming of Text Classification Neural Networks”, Et Al 2018
“Adversarial Reprogramming of Text Classification Neural Networks”
“This Time With Feeling: Learning Expressive Musical Performance”, Et Al 2018
“This Time with Feeling: Learning Expressive Musical Performance”
“Character-Level Language Modeling With Deeper Self-Attention”, Al-Et Al 2018
“Character-Level Language Modeling with Deeper Self-Attention”
“General Value Function Networks”, Et Al 2018
“Universal Transformers”, Et Al 2018
“Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme”, Et Al 2018
“Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme”
“Accurate Uncertainties for Deep Learning Using Calibrated Regression”, Et Al 2018
“Accurate Uncertainties for Deep Learning Using Calibrated Regression”
“Neural Ordinary Differential Equations”, Et Al 2018
“Know What You Don’t Know: Unanswerable Questions for SQuAD”, Et Al 2018
“Know What You Don’t Know: Unanswerable Questions for SQuAD”
“Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data”, Et Al 2018
“Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data”
“Hierarchical Neural Story Generation”, Et Al 2018
“Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context”, Et Al 2018
“Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context”
“Newsroom: A Dataset of 1.3 Million Summaries With Diverse Extractive Strategies”, Et Al 2018
“Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies”
“A Tree Search Algorithm for Sequence Labeling”, Et Al 2018
“An Analysis of Neural Language Modeling at Multiple Scales”, Et Al 2018
“An Analysis of Neural Language Modeling at Multiple Scales”
“Reviving and Improving Recurrent Back-Propagation”, Et Al 2018
“Learning Memory Access Patterns”, Et Al 2018
“Learning Longer-term Dependencies in RNNs With Auxiliary Losses”, Et Al 2018
“Learning Longer-term Dependencies in RNNs with Auxiliary Losses”
“One Big Net For Everything”, 2018
“Efficient Neural Audio Synthesis”, Et Al 2018
“Deep Contextualized Word Representations”, Et Al 2018
“M-Walk: Learning to Walk over Graphs Using Monte Carlo Tree Search”, Et Al 2018
“M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search”
“ULMFiT: Universal Language Model Fine-tuning for Text Classification”, 2018
“ULMFiT: Universal Language Model Fine-tuning for Text Classification”
“Large-scale Comparison of Machine Learning Methods for Drug Target Prediction on ChEMBL”, Et Al 2018
“Large-scale comparison of machine learning methods for drug target prediction on ChEMBL”
“A Flexible Approach to Automated RNN Architecture Generation”, Et Al 2017
“A Flexible Approach to Automated RNN Architecture Generation”
“Learning Compact Recurrent Neural Networks With Block-Term Tensor Decomposition”, Et Al 2017
“Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition”
“Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent”, Et Al 2017
“Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent”
“Evaluating Prose Style Transfer With the Bible”, Et Al 2017
“Breaking the Softmax Bottleneck: A High-Rank RNN Language Model”, Et Al 2017
“Breaking the Softmax Bottleneck: A High-Rank RNN Language Model”
“Neural Speed Reading via Skim-RNN”, Et Al 2017
“Generalization without Systematicity: On the Compositional Skills of Sequence-to-sequence Recurrent Networks”, 2017
“Mixed Precision Training”, Et Al 2017
“To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression”, 2017
“To prune, or not to prune: exploring the efficacy of pruning for model compression”
“N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning”, Et Al 2017
“N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning”
“Why Pay More When You Can Pay Less: A Joint Learning Framework for Active Feature Acquisition and Classification”, Et Al 2017
“SRU: Simple Recurrent Units for Highly Parallelizable Recurrence”, Et Al 2017
“SRU: Simple Recurrent Units for Highly Parallelizable Recurrence”
“Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks”, 2017
“Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks”
“Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks”, Et Al 2017
“Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks”
“Twin Networks: Matching the Future for Sequence Generation”, Et Al 2017
“Twin Networks: Matching the Future for Sequence Generation”
“Revisiting Activation Regularization for Language RNNs”, Et Al 2017
“Bayesian Sparsification of Recurrent Neural Networks”, Et Al 2017
“On the State-of-the-Art of Evaluation in Neural Language Models”, Et Al 2017
“On the State-of-the-Art of Evaluation in Neural Language Models”
“Controlling Linguistic Style Aspects in Neural Language Generation”, 2017
“Controlling Linguistic Style Aspects in Neural Language Generation”
“Device Placement Optimization With Reinforcement Learning”, Et Al 2017
“Language Generation With Recurrent Generative Adversarial Networks without Pre-training”, Et Al 2017
“Language Generation with Recurrent Generative Adversarial Networks without Pre-training”
“Biased Importance Sampling for Deep Neural Network Training”, 2017
“Biased Importance Sampling for Deep Neural Network Training”
“Deriving Neural Architectures from Sequence and Graph Kernels”, Et Al 2017
“Deriving Neural Architectures from Sequence and Graph Kernels”
“A Deep Reinforced Model for Abstractive Summarization”, Et Al 2017
“DeepTingle”, Et Al 2017
“TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension”, Et Al 2017
“TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension”
“A Neural Network System for Transformation of Regional Cuisine Style”, Et Al 2017
“A neural network system for transformation of regional cuisine style”
“Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU”, 2017
“Sharp Models on Dull Hardware: Fast and Accurate Neural Machine Translation Decoding on the CPU”
“Adversarial Neural Machine Translation”, Et Al 2017
“Learning to Reason: End-to-End Module Networks for Visual Question Answering”, Et Al 2017
“Learning to Reason: End-to-End Module Networks for Visual Question Answering”
“SearchQA: A New Q&A Dataset Augmented With Context from a Search Engine”, Et Al 2017
“SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine”
“Exploring Sparsity in Recurrent Neural Networks”, Et Al 2017
“DeepAR: Probabilistic Forecasting With Autoregressive Recurrent Networks”, Et Al 2017
“DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks”
“Recurrent Environment Simulators”, Et Al 2017
“Learning to Generate Reviews and Discovering Sentiment”, Et Al 2017
“I2T2I: Learning Text to Image Synthesis With Textual Data Augmentation”, Et Al 2017
“I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation”
“Improving Neural Machine Translation With Conditional Sequence Generative Adversarial Nets”, Et Al 2017
“Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets”
“Learned Optimizers That Scale and Generalize”, Et Al 2017
“Parallel Multiscale Autoregressive Density Estimation”, Et Al 2017
“Tracking the World State With Recurrent Entity Networks”, Et Al 2017
“Optimization As a Model for Few-Shot Learning”, 2017
“Neural Combinatorial Optimization With Reinforcement Learning”, Et Al 2017
“Neural Combinatorial Optimization with Reinforcement Learning”
“Tuning Recurrent Neural Networks With Reinforcement Learning”, Et Al 2017
“Tuning Recurrent Neural Networks with Reinforcement Learning”
“Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer”, Et Al 2017
“Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer”
“Neural Data Filter for Bootstrapping Stochastic Gradient Descent”, Et Al 2017
“Neural Data Filter for Bootstrapping Stochastic Gradient Descent”
“SampleRNN: An Unconditional End-to-End Neural Audio Generation Model”, Et Al 2016
“SampleRNN: An Unconditional End-to-End Neural Audio Generation Model”
“Improving Neural Language Models With a Continuous Cache”, Et Al 2016
“NewsQA: A Machine Comprehension Dataset”, Et Al 2016
“Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation”, Et Al 2016
“Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation”
“Learning to Learn without Gradient Descent by Gradient Descent”, Et Al 2016
“Learning to Learn without Gradient Descent by Gradient Descent”
“RL2: Fast Reinforcement Learning via Slow Reinforcement Learning”, Et Al 2016
“RL2: Fast Reinforcement Learning via Slow Reinforcement Learning”
“DeepCoder: Learning to Write Programs”, Et Al 2016
“Bidirectional Attention Flow for Machine Comprehension”, Et Al 2016
“Neural Architecture Search With Reinforcement Learning”, 2016
“QRNNs: Quasi-Recurrent Neural Networks”, Et Al 2016
“Hybrid Computing Using a Neural Network With Dynamic External Memory”, Et Al 2016
“Hybrid computing using a neural network with dynamic external memory”
“Using Fast Weights to Attend to the Recent Past”, Et Al 2016
“Achieving Human Parity in Conversational Speech Recognition”, Et Al 2016
“Achieving Human Parity in Conversational Speech Recognition”
“HyperNetworks”, Et Al 2016
“Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”, Et Al 2016
“Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”
“Pointer Sentinel Mixture Models”, Et Al 2016
“Deep Learning Human Mind for Automated Visual Classification”, Et Al 2016
“Deep Learning Human Mind for Automated Visual Classification”
“Decoupled Neural Interfaces Using Synthetic Gradients”, Et Al 2016
“Full Resolution Image Compression With Recurrent Neural Networks”, Et Al 2016
“Full Resolution Image Compression with Recurrent Neural Networks”
“LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks”, Et Al 2016
“LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks”
“Learning to Learn by Gradient Descent by Gradient Descent”, Et Al 2016
“Iterative Alternating Neural Attention for Machine Reading”, Et Al 2016
“Iterative Alternating Neural Attention for Machine Reading”
“Deep Reinforcement Learning for Dialogue Generation”, Et Al 2016
“Programming With a Differentiable Forth Interpreter”, Et Al 2016
“Training Deep Nets With Sublinear Memory Cost”, Et Al 2016
“Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex”, 2016
“Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex”
“Improving Sentence Compression by Learning to Predict Gaze”, Et Al 2016
“Improving sentence compression by learning to predict gaze”
“Adaptive Computation Time for Recurrent Neural Networks”, 2016
“Dynamic Memory Networks for Visual and Textual Question Answering”, Et Al 2016
“Dynamic Memory Networks for Visual and Textual Question Answering”
“PlaNet—Photo Geolocation With Convolutional Neural Networks”, Et Al 2016
“PlaNet—Photo Geolocation with Convolutional Neural Networks”
“Learning Distributed Representations of Sentences from Unlabeled Data”, Et Al 2016
“Learning Distributed Representations of Sentences from Unlabeled Data”
“Exploring the Limits of Language Modeling”, Et Al 2016
“Pixel Recurrent Neural Networks”, Et Al 2016
“Persistent RNNs: Stashing Recurrent Weights On-Chip”, Et Al 2016
“Deep-Spying: Spying Using Smartwatch and Deep Learning”, 2015
“On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models”, 2015
“Neural GPUs Learn Algorithms”, 2015
“Sequence Level Training With Recurrent Neural Networks”, Et Al 2015
“Generating Sentences from a Continuous Space”, Et Al 2015
“Generative Concatenative Nets Jointly Learn to Write and Classify Reviews”, Et Al 2015
“Generative Concatenative Nets Jointly Learn to Write and Classify Reviews”
“Generating Images from Captions With Attention”, Et Al 2015
“Semi-supervised Sequence Learning”, 2015
“RNN Metadata for Mimicking Author Style”, Gwern 2015
“Deep Recurrent Q-Learning for Partially Observable MDPs”, 2015
“Teaching Machines to Read and Comprehend”, Et Al 2015
“Scheduled Sampling for Sequence Prediction With Recurrent Neural Networks”, Et Al 2015
“Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks”
“Visualizing and Understanding Recurrent Networks”, Et Al 2015
“The Unreasonable Effectiveness of Recurrent Neural Networks”, 2015
“The Unreasonable Effectiveness of Recurrent Neural Networks”
“Deep Neural Networks for Large Vocabulary Handwritten Text Recognition”, 2015
“Deep Neural Networks for Large Vocabulary Handwritten Text Recognition”
“Reinforcement Learning Neural Turing Machines—Revised”, 2015
“End-To-End Memory Networks”, Et Al 2015
“Inferring Algorithmic Patterns With Stack-Augmented Recurrent Nets”, Et Al 2015
“Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets”
“DRAW: A Recurrent Neural Network For Image Generation”, Et Al 2015
“Neural Turing Machines”, Et Al 2014
“Learning to Execute”, 2014
“Neural Machine Translation by Jointly Learning to Align and Translate”, Et Al 2014
“Neural Machine Translation by Jointly Learning to Align and Translate”
“Distributed Representations of Sentences and Documents”, 2014
“One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling”, Et Al 2013
“One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling”
“Generating Sequences With Recurrent Neural Networks”, 2013
“Recurrent Neural Network Based Language Model”, Et Al 2010
“Large Language Models in Machine Translation”, Et Al 2007
“Learning to Learn Using Gradient Descent”, Et Al 2001
“Long Short-Term Memory”, 1997
“Flat Minima”, 1997
“Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity”, 1995
“Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity”
“A Focused Backpropagation Algorithm for Temporal Pattern Recognition”, 1995
“A Focused Backpropagation Algorithm for Temporal Pattern Recognition”
“Learning Complex, Extended Sequences Using the Principle of History Compression”, 1992
“Learning Complex, Extended Sequences Using the Principle of History Compression”
“Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks”, 1992
“Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks”
“Untersuchungen Zu Dynamischen Neuronalen Netzen [Studies of Dynamic Neural Networks]”, 1991
“Untersuchungen zu dynamischen neuronalen Netzen [Studies of dynamic neural networks]”
“Finding Structure In Time”, 1990
“Connectionist Music Composition Based on Melodic, Stylistic, and Psychophysical Constraints [Technical Report CU-CS–495–90]”, 1990
“A Learning Algorithm for Continually Running Fully Recurrent Neural Networks”, 1989b
“A Learning Algorithm for Continually Running Fully Recurrent Neural Networks”
“A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks”, 1989
“A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks”
“Experimental Analysis of the Real-time Recurrent Learning Algorithm”, 1989
“Experimental Analysis of the Real-time Recurrent Learning Algorithm”
“A Sticky-Bit Approach for Learning to Represent State”, 1988
“The Utility Driven Dynamic Error Propagation Network (RTRL)”, 1987
“The Utility Driven Dynamic Error Propagation Network (RTRL)”
“Serial Order: A Parallel Distributed Processing Approach”, 1986
“Attention and Augmented Recurrent Neural Networks”
“Deep Learning for Assisting the Process of Music Composition (part 3)”
“Deep learning for assisting the process of music composition (part 3)”
Wikipedia
Miscellaneous
-
/doc/ai/nn/rnn/2021-jaegle-figure2-perceiverioarchitecture.png
-
/doc/ai/nn/rnn/2021-droppo-figure5-lstmvstransfomerscaling.png
-
/doc/ai/nn/rnn/2020-deepmind-agent57-figure3-deepreinforcementlearningtimeline.svg
-
/doc/ai/nn/rnn/2017-khalifa-example3-incoherentdeeptinglesamplepromptedwithmobydickcallmeishmael.png
-
/doc/ai/nn/rnn/2016-06-09-rossgoodwin-adventuresinnarratedreality-2.html
-
/doc/ai/nn/rnn/2016-03-19-rossgoodwin-adventuresinnarratedreality-1.html
-
/doc/ai/nn/rnn/2015-06-03-karpathy-charrnn-visualization.tar.xz
-
http://mi.eng.cam.ac.uk/projects/cued-rnnlm/papers/Interspeech15.pdf
-
https://ahrm.github.io/jekyll/update/2022/04/14/using-languge-models-to-read-faster.html
-
https://homepages.inf.ed.ac.uk/abmayne/publications/sennrich2016NAACL.pdf
-
https://patentimages.storage.googleapis.com/57/53/22/91b8a6792dbb1e/US20180204116A1.pdf#deepmind
-
https://soundcloud.com/seaandsailor/sets/char-rnn-composes-irish-folk-music
-
https://twitter.com/arankomatsuzaki/status/1639000379978403853
-
https://www.danieldjohnson.com/2015/08/03/composing-music-with-recurrent-neural-networks/
-
https://www.reddit.com/r/MachineLearning/comments/umq908/r_rwkvv2rnn_a_parallelizable_rnn_with/
Link Bibliography
-
https://arxiv.org/abs/2305.13048
: “RWKV: Reinventing RNNs for the Transformer Era”, -
https://arxiv.org/abs/2303.06349#deepmind
: “Resurrecting Recurrent Neural Networks for Long Sequences”, Antonio Orvieto, Samuel L. Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, Soham De -
https://arxiv.org/abs/2302.13939
: “SpikeGPT: Generative Pre-trained Language Model With Spiking Neural Networks”, Rui-Jie Zhu, Qihang Zhao, Jason K. Eshraghian -
2023-bures.pdf
: “Organic Reaction Mechanism Classification Using Machine Learning”, Jordi Burés, Igor Larrosa -
https://arxiv.org/abs/2212.14052
: “Hungry Hungry Hippos: Towards Language Modeling With State Space Models”, Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré -
https://arxiv.org/abs/2211.07638
: “Legged Locomotion in Challenging Terrains Using Egocentric Vision”, Ananye Agarwal, Ashish Kumar, Jitendra Malik, Deepak Pathak -
https://arxiv.org/abs/2209.11737
: “Semantic Scene Descriptions As an Objective of Human Vision”, Adrien Doerig, Tim C. Kietzmann, Emily Allen, Yihan Wu, Thomas Naselaris, Kendrick Kay, Ian Charest -
https://arxiv.org/abs/2205.01972
: “Sequencer: Deep LSTM for Image Classification”, Yuki Tatsunami, Masato Taki -
https://arxiv.org/abs/2203.07852
: “Block-Recurrent Transformers”, DeLesley Hutchins, Imanol Schlag, Yuhuai Wu, Ethan Dyer, Behnam Neyshabur -
https://arxiv.org/abs/2202.07765#deepmind
: “General-purpose, Long-context Autoregressive Modeling With Perceiver AR”, -
2022-miki.pdf
: “Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild”, Takahiro Miki, Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, Marco Hutter -
https://arxiv.org/abs/2111.00396
: “S4: Efficiently Modeling Long Sequences With Structured State Spaces”, Albert Gu, Karan Goel, Christopher Ré -
https://elifesciences.org/articles/66039
: “A Connectome of the <em>Drosophila< / em> Central Complex Reveals Network Motifs Suitable for Flexible Navigation and Context-dependent Action Selection”, -
https://arxiv.org/abs/2107.14795#deepmind
: “Perceiver IO: A General Architecture for Structured Inputs & Outputs”, -
https://proceedings.mlr.press/v139/vicol21a.html
: “Unbiased Gradient Estimation in Unrolled Computation Graphs With Persistent Evolution Strategies”, Paul Vicol, Luke Metz, Jascha Sohl-Dickstein -
2021-delul.pdf
: “Shelley: A Crowd-sourced Collaborative Horror Writer”, Pinar Yanardag Delul, Manuel Cebrian, Iyad Rahwan -
2021-jouppi.pdf
: “Ten Lessons From Three Generations Shaped Google’s TPUv4i”, -
https://arxiv.org/abs/2106.06981
: “RASP: Thinking Like Transformers”, Gail Weiss, Yoav Goldberg, Eran Yahav -
https://arxiv.org/abs/2106.09488#amazon
: “Scaling Laws for Acoustic Models”, Jasha Droppo, Oguz Elibol -
https://arxiv.org/abs/2103.03206#deepmind
: “Perceiver: General Perception With Iterative Attention”, Andrew Jaegle, Felix Gimeno, Andrew Brock, Andrew Zisserman, Oriol Vinyals, Joao Carreira -
https://arxiv.org/abs/2102.04159
: “Deep Residual Learning in Spiking Neural Networks”, Wei Fang, Zhaofei Yu, Yanqi Chen, Tiejun Huang, Timothée Masquelier, Yonghong Tian -
https://arxiv.org/abs/2011.12692#tencent
: “Towards Playing Full MOBA Games With Deep Reinforcement Learning”, -
https://arxiv.org/abs/2008.07669
: “HiPPO: Recurrent Memory With Optimal Polynomial Projections”, Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, Christopher Re -
https://www.lesswrong.com/posts/Wnqua6eQkewL3bqsF/matt-botvinick-on-the-spontaneous-emergence-of-learning
: “Matt Botvinick on the Spontaneous Emergence of Learning Algorithms”, Adam Scholl -
https://www.deepmind.com/blog/agent57-outperforming-the-human-atari-benchmark
: “Agent57: Outperforming the Human Atari Benchmark”, Adrià Puigdomènech, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Daniel Guo, Charles Blundell -
https://arxiv.org/abs/2002.03629
: “Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving”, Yang Song, Chenlin Meng, Renjie Liao, Stefano Ermon -
https://arxiv.org/abs/2001.08361#openai
: “Scaling Laws for Neural Language Models”, -
https://arxiv.org/abs/1911.11423
: “Single Headed Attention RNN: Stop Thinking With Your Head”, Stephen Merity -
https://openreview.net/forum?id=HyxlRHBlUB
: “Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks”, Aaron R. Voelker, Ivana Kajić, Chris Eliasmith -
https://arxiv.org/abs/1910.06591#deepmind
: “SEED RL: Scalable and Efficient Deep-RL With Accelerated Central Inference”, Lasse Espeholt, Raphaël Marinier, Piotr Stanczyk, Ke Wang, Marcin Michalski -
https://arxiv.org/abs/1905.01320#deepmind
: “Meta-learners’ Learning Dynamics Are unlike Learners’”, Neil C. Rabinowitz -
https://openreview.net/forum?id=r1lyTjAqYX#deepmind
: “R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning”, Steven Kapturowski, Georg Ostrovski, John Quan, Remi Munos, Will Dabney -
https://arxiv.org/abs/1709.02755
: “SRU: Simple Recurrent Units for Highly Parallelizable Recurrence”, Tao Lei, Yu Zhang, Sida I. Wang, Hui Dai, Yoav Artzi -
https://arxiv.org/abs/1704.05526
: “Learning to Reason: End-to-End Module Networks for Visual Question Answering”, Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Kate Saenko -
https://arxiv.org/abs/1704.05179
: “SearchQA: A New Q&A Dataset Augmented With Context from a Search Engine”, Matthew Dunn, Levent Sagun, Mike Higgins, V. Ugur Guney, Volkan Cirik, Kyunghyun Cho -
rnn-metadata
: “RNN Metadata for Mimicking Author Style”, gwern -
2010-mikolov.pdf
: “Recurrent Neural Network Based Language Model”, Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur -
1991-hochreiter.pdf
: “<em>Untersuchungen Zu Dynamischen Neuronalen Netzen< / em> [Studies of Dynamic Neural Networks]”, Sepp Hochreiter -
1989-williams.pdf
: “Experimental Analysis of the Real-time Recurrent Learning Algorithm”, Ronald J. Williams, David Zipser