- See Also
-
Links
- “Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula”, Et Al 2022
- “Melting Pot 2.0”, Et Al 2022
- “Token Turing Machines”, Et Al 2022
- “Legged Locomotion in Challenging Terrains Using Egocentric Vision”, Et Al 2022
- “Over-communicate No More: Situated RL Agents Learn Concise Communication Protocols”, Et Al 2022
- “Hyperbolic Deep Reinforcement Learning”, Et Al 2022
- “Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning”, Et Al 2022
- “Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies With One Objective (ALM)”, Et Al 2022
- “Human-level Atari 200× Faster”, Et Al 2022
- “Nearest Neighbor Non-autoregressive Text Generation”, Et Al 2022
- “A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning”, Et Al 2022
- “Improved Policy Optimization for Online Imitation Learning”, Et Al 2022
- “Fine-grained Image Captioning With CLIP Reward”, Et Al 2022
- “Is Vanilla Policy Gradient Overlooked? Analyzing Deep Reinforcement Learning for Hanabi”, Et Al 2022
- “Quantifying and Alleviating Political Bias in Language Models”, Et Al 2022
- “Machine Learning Helps Control Tokamak Plasmas”, 2022
- “Retrieval-Augmented Reinforcement Learning”, Et Al 2022
- “Magnetic Control of Tokamak Plasmas through Deep Reinforcement Learning”, Et Al 2022
- “A Data-driven Approach for Learning to Control Computers”, Et Al 2022
- “Policy Learning and Evaluation With Randomized Quasi-Monte Carlo”, Et Al 2022
- “Why Should I Trust You, Bellman? The Bellman Error Is a Poor Replacement for Value Error”, Et Al 2022
- “Learning Dynamics and Generalization in Deep Reinforcement Learning”, Et Al 2022
- “Deep Reinforcement Learning Policies Learn Shared Adversarial Features Across MDPs”, 2021
- “Amortized Noisy Channel Neural Machine Translation”, Et Al 2021
- “Simple but Effective: CLIP Embeddings for Embodied AI”, Et Al 2021
- “Recurrent Model-Free RL Is a Strong Baseline for Many POMDPs”, Et Al 2021
- “DroQ: Dropout Q-Functions for Doubly Efficient Reinforcement Learning”, Et Al 2021
- “Batch Size-invariance for Policy Optimization”, Et Al 2021
- “MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research”, Et Al 2021
- “Bootstrapped Meta-Learning”, Et Al 2021
- “Megaverse: Simulating Embodied Agents at One Million Experiences per Second”, Et Al 2021
- “Unbiased Gradient Estimation in Unrolled Computation Graphs With Persistent Evolution Strategies”, Et Al 2021
- “Multi-task Curriculum Learning in a Complex, Visual, Hard-exploration Domain: Minecraft”, Et Al 2021
- “On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning”, Et Al 2021
- “Podracer Architectures for Scalable Reinforcement Learning”, Et Al 2021
- “Muesli: Combining Improvements in Policy Optimization”, Et Al 2021
- “Counter-Strike Deathmatch With Large-Scale Behavioural Cloning”, 2021
- “Efficient Transformers in Reinforcement Learning Using Actor-Learner Distillation”, 2021
- “Large Batch Simulation for Deep Reinforcement Learning”, Et Al 2021
- “Reinforcement Learning for Datacenter Congestion Control”, Et Al 2021
- “How RL Agents Behave When Their Actions Are Modified”, 2021
- “Randomized Ensembled Double Q-Learning (REDQ): Learning Fast Without a Model”, Et Al 2021
- “MLGO: a Machine Learning Guided Compiler Optimizations Framework”, Et Al 2021
- “Evolving Reinforcement Learning Algorithms”, Co-Et Al 2021
- “Using Deep Reinforcement Learning to Reveal How the Brain Encodes Abstract State-space Representations in High-dimensional Environments”, 2020
- “Autonomous Navigation of Stratospheric Balloons Using Reinforcement Learning”, Et Al 2020
- “Offline Learning from Demonstrations and Unlabeled Experience”, Et Al 2020
- “A Unified Framework for Dopamine Signals across Timescales”, Et Al 2020
- “Adversarial Vulnerabilities of Human Decision-making”, Et Al 2020
- “Human-centric Dialog Training via Offline Reinforcement Learning”, Et Al 2020
- “Emergent Social Learning via Multi-agent Reinforcement Learning”, Et Al 2020
- “Super-Human Performance In Gran Turismo Sport Using Deep Reinforcement Learning”, Et Al 2020
- “Improving GAN Training With Probability Ratio Clipping and Sample Reweighting”, Et Al 2020
- “Conservative Q-Learning for Offline Reinforcement Learning”, Et Al 2020
- “Controlling Overestimation Bias With Truncated Mixture of Continuous Distributional Quantile Critics (TQC)”, Et Al 2020
- “Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels”, Et Al 2020
- “Evaluating the Rainbow DQN Agent in Hanabi With Unseen Partners”, Et Al 2020
- “Chip Placement With Deep Reinforcement Learning”, Et Al 2020
- “CURL: Contrastive Unsupervised Representations for Reinforcement Learning”, Et Al 2020
- “Evolving Normalization-Activation Layers”, Et Al 2020
- “Benchmarking End-to-End Behavioural Cloning on Video Games”, Et Al 2020
- “Agent57: Outperforming the Atari Human Benchmark”, Et Al 2020
- “Deep Neuroethology of a Virtual Rodent”, Et Al 2020
- “Causal Evidence Supporting the Proposal That Dopamine Transients Function As Temporal Difference Prediction Errors”, Et Al 2020
- “A Distributional Code for Value in Dopamine-based Reinforcement Learning”, Et Al 2020
- “SEED RL: Scalable and Efficient Deep-RL With Accelerated Central Inference”, Et Al 2019
- “QUARL: Quantized Reinforcement Learning (ActorQ)”, Et Al 2019
- “Deep Learning Enables Rapid Identification of Potent DDR1 Kinase Inhibitors”, Et Al 2019
- “Is Deep Reinforcement Learning Really Superhuman on Atari? Leveling the Playing Field”, Et Al 2019
- “A View on Deep Reinforcement Learning in System Optimization”, Haj-Et Al 2019
- “Playing the Lottery With Rewards and Multiple Languages: Lottery Tickets in RL and NLP”, Et Al 2019
- “Universal Quantum Control through Deep Reinforcement Learning”, Et Al 2019
- “Reinforcement Learning for Recommender Systems: A Case Study on Youtube”, 2019
- “Benchmarking Classic and Learned Navigation in Complex 3D Environments”, Et Al 2019
- “AutoPhase: Compiler Phase-Ordering for High Level Synthesis With Deep Reinforcement Learning”, Haj-Et Al 2019
- “Designing Neural Networks through Neuroevolution”, Et Al 2019
- “Reinforcement Learning in Artificial and Biological Systems”, 2019
- “Anxiety, Depression, and Decision Making: A Computational Perspective”, 2019
- “IRLAS: Inverse Reinforcement Learning for Architecture Search”, Et Al 2018
- “Top-K Off-Policy Correction for a REINFORCE Recommender System”, Et Al 2018
- “Quantifying Generalization in Reinforcement Learning”, Et Al 2018
- “Relative Entropy Regularized Policy Iteration”, Et Al 2018
- “ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware”, Et Al 2018
- “Neural Probabilistic Motor Primitives for Humanoid Control”, Et Al 2018
- “InstaNAS: Instance-aware Neural Architecture Search”, Et Al 2018
- “A Closer Look at Deep Policy Gradients”, Et Al 2018
- “One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets With RL”, Et Al 2018
- “Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow”, Et Al 2018
- “Learning to Perform Local Rewriting for Combinatorial Optimization”, 2018
- “R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning”, Et Al 2018
- “Benchmarking Reinforcement Learning Algorithms on Real-World Robots”, Et Al 2018
- “Deterministic Implementations for Reproducibility in Deep Reinforcement Learning”, Et Al 2018
- “Multi-task Deep Reinforcement Learning With PopArt”, Et Al 2018
- “Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction”, 2018
- “Searching Toward Pareto-Optimal Device-Aware Neural Architectures”, Et Al 2018
- “A Study of Reinforcement Learning for Neural Machine Translation”, Et Al 2018
- “Improving Abstraction in Text Summarization”, Et Al 2018
- “Learning to Optimize Join Queries With Deep Reinforcement Learning”, Et Al 2018
- “Is Q-learning Provably Efficient?”, Et Al 2018
- “InfoNCE: Representation Learning With Contrastive Predictive Coding (CPC)”, Et Al 2018
- “Maximum a Posteriori Policy Optimisation”, Et Al 2018
- “Resource-Efficient Neural Architect”, Et Al 2018
- “The Unusual Effectiveness of Averaging in GAN Training”, Et Al 2018
- “Playing Atari With Six Neurons”, Et Al 2018
- “Measuring the Intrinsic Dimension of Objective Landscapes”, Et Al 2018
- “DP4G: Distributed Distributional Deterministic Policy Gradients”, Barth-Et Al 2018
- “Optimizing Query Evaluations Using Reinforcement Learning for Web Search”, Et Al 2018
- “Delayed Impact of Fair Machine Learning”, Et Al 2018
- “Learning Memory Access Patterns”, Et Al 2018
- “Model-Ensemble Trust-Region Policy Optimization”, Et Al 2018
- “TD3: Addressing Function Approximation Error in Actor-Critic Methods”, Et Al 2018
- “Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration”, Et Al 2018
- “Unicorn: Continual Learning With a Universal, Off-policy Agent”, Et Al 2018
- “Efficient Neural Architecture Search via Parameter Sharing”, Et Al 2018
- “IMPALA: Scalable Distributed Deep-RL With Importance Weighted Actor-Learner Architectures”, Et Al 2018
- “Regularized Evolution for Image Classifier Architecture Search”, Et Al 2018
- “Interactive Grounded Language Acquisition and Generalization in a 2D World”, Et Al 2018
- “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning With a Stochastic Actor”, Et Al 2018
- “Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning”, Et Al 2017
- “The Case for Learned Index Structures”, Et Al 2017
- “AI Safety Gridworlds”, Et Al 2017
- “Classification With Costly Features Using Deep Reinforcement Learning”, Et Al 2017
- “Towards the Use of Deep Reinforcement Learning With Global Policy For Query-based Extractive Summarisation”, 2017
- “Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection”, 2017
- “Gradient-free Policy Architecture Search and Adaptation”, Et Al 2017
- “Swish: Searching for Activation Functions”, Et Al 2017
- “Rainbow: Combining Improvements in Deep Reinforcement Learning”, Et Al 2017
- “OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning”, Et Al 2017
- “Deep Reinforcement Learning That Matters”, Et Al 2017
- “Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning”, Et Al 2017
- “The Successor Representation in Human Reinforcement Learning”, Et Al 2017
- “Practical Block-wise Neural Network Architecture Generation”, Et Al 2017
- “Learning Policies for Adaptive Tracking With Deep Feature Cascades”, Et Al 2017
- “Reinforced Video Captioning With Entailment Rewards”, 2017
- “A Distributional Perspective on Reinforcement Learning”, Et Al 2017
- “Trial without Error: Towards Safe Reinforcement Learning via Human Intervention”, Et Al 2017
- “Tracking As Online Decision-Making: Learning a Policy from Streaming Videos With Reinforcement Learning”, III & 2017
- “Efficient Architecture Search by Network Transformation”, Et Al 2017
- “Grammatical Error Correction With Neural Reinforcement Learning”, Et Al 2017
- “Noisy Networks for Exploration”, Et Al 2017
- “Gated-Attention Architectures for Task-Oriented Language Grounding”, Et Al 2017
- “IDK Cascades: Fast Deep Learning by Learning Not to Overthink”, Et Al 2017
- “Teaching Machines to Describe Images via Natural Language Feedback”, 2017
- “Learning Time/Memory-Efficient Deep Architectures With Budgeted Super Networks”, 2017
- “Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models”, Et Al 2017
- “Ask the Right Questions: Active Question Reformulation With Reinforcement Learning”, Et Al 2017
- “A Deep Reinforced Model for Abstractive Summarization”, Et Al 2017
- “Inferring and Executing Programs for Visual Reasoning”, Et Al 2017
- “Time-Contrastive Networks: Self-Supervised Learning from Video”, Et Al 2017
- “RAM: Dynamic Computational Time for Visual Attention”, Et Al 2017
- “End-to-end Optimization of Goal-driven and Visually Grounded Dialogue Systems”, Et Al 2017
- “Improving Neural Machine Translation With Conditional Sequence Generative Adversarial Nets”, Et Al 2017
- “Neural Episodic Control”, Et Al 2017
- “CoDeepNEAT: Evolving Deep Neural Networks”, Et Al 2017
- “Tuning Recurrent Neural Networks With Reinforcement Learning”, Et Al 2017
- “PathNet: Evolution Channels Gradient Descent in Super Neural Networks”, Et Al 2017
- “The Kelly Coin-Flipping Game: Exact Solutions”, Et Al 2017
- “Deep Reinforcement Learning: A Brief Survey”, Et Al 2017
- “Loss Is Its Own Reward: Self-Supervision for Reinforcement Learning”, Et Al 2016
- “Overcoming Catastrophic Forgetting in Neural Networks”, Et Al 2016
- “Self-critical Sequence Training for Image Captioning”, Et Al 2016
- “Reinforcement Learning With Unsupervised Auxiliary Tasks”, Et Al 2016
- “A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models”, Et Al 2016
- “Hybrid Computing Using a Neural Network With Dynamic External Memory”, Et Al 2016
- “Connecting Generative Adversarial Networks and Actor-Critic Methods”, 2016
- “Deep Reinforcement Learning for Mention-Ranking Coreference Models”, 2016
- “Deep Neural Networks for YouTube Recommendations”, Et Al 2016
- “The Malmo Platform for Artificial Intelligence Experimentation”, Et Al 2016
- “Progressive Neural Networks”, Et Al 2016
- “Learning to Optimize”, 2016
- “Deep Reinforcement Learning for Dialogue Generation”, Et Al 2016
- “ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning”, Et Al 2016
- “Improving Information Extraction by Acquiring External Evidence With Reinforcement Learning”, Et Al 2016
- “Asynchronous Methods for Deep Reinforcement Learning”, Et Al 2016
- “Dueling Network Architectures for Deep Reinforcement Learning”, Et Al 2015
- “Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning”, Et Al 2015
- “Prioritized Experience Replay”, Et Al 2015
- “Deep Reinforcement Learning With Double Q-learning”, Et Al 2015
- “Gorila: Massively Parallel Methods for Deep Reinforcement Learning”, Et Al 2015
- “Reinforcement Learning Neural Turing Machines—Revised”, 2015
- “An Invitation to Imitation”, 2015
- “TRPO: Trust Region Policy Optimization”, Et Al 2015
- “DRAW: A Recurrent Neural Network For Image Generation”, Et Al 2015
- “Random Feedback Weights Support Learning in Deep Neural Networks”, Et Al 2014
- “Learning to Execute”, 2014
- “Does Temporal Discounting Explain Unhealthy Behavior? A Systematic Review and Reinforcement Learning Perspective”, Et Al 2014
- “The Arcade Learning Environment: An Evaluation Platform for General Agents”, Et Al 2012
- “Artist Agent: A Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting”, Et Al 2012
- “Neural Mechanisms of Speed-accuracy Tradeoff”, 2012
- “DAGGER: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning”, Et Al 2010
- “Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal”, 2005
- “It Takes Two Neurons To Ride a Bicycle”, 2004
- “Learning to Drive a Bicycle Using Reinforcement Learning and Shaping”, Randløv & 1998
- “Descriptor Predictive Control: Tracking Controllers for a Riderless Bicycle”, Et Al 1996
- “Control for an Autonomous Bicycle”, 1995
- “Simple Statistical Gradient-following Algorithms for Connectionist Reinforcement Learning”, 1992
- “Experiments on the Mechanization of Game-learning Part II. Rule-Based Learning and the Human Window [BOXES]”, 1982
- “Experiments on the Mechanization of Game-learning Part I. Characterization of the Model and Its Parameters [MENACE]”, 1963
- “A Matchbox Game-Learning Machine”, 1962
- Miscellaneous
- Link Bibliography
See Also
Links
“Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula”, Et Al 2022
“Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula”, 2022-12-02 ( ; similar)
“Melting Pot 2.0”, Et Al 2022
“Melting Pot 2.0”, 2022-11-24 ( ; similar)
“Token Turing Machines”, Et Al 2022
“Token Turing Machines”, 2022-11-16 ( ; similar)
“Legged Locomotion in Challenging Terrains Using Egocentric Vision”, Et Al 2022
“Legged Locomotion in Challenging Terrains using Egocentric Vision”, 2022-11-14 ( ; similar; bibliography)
“Over-communicate No More: Situated RL Agents Learn Concise Communication Protocols”, Et Al 2022
“Over-communicate no more: Situated RL agents learn concise communication protocols”, 2022-11-02 ( ; similar)
“Hyperbolic Deep Reinforcement Learning”, Et Al 2022
“Hyperbolic Deep Reinforcement Learning”, 2022-10-04 (similar; bibliography)
“Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning”, Et Al 2022
“Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning”, 2022-09-29 ( ; similar)
“Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies With One Objective (ALM)”, Et Al 2022
“Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective (ALM)”, 2022-09-18 ( ; similar; bibliography)
“Human-level Atari 200× Faster”, Et Al 2022
“Human-level Atari 200× faster”, 2022-09-15 ( ; similar; bibliography)
“Nearest Neighbor Non-autoregressive Text Generation”, Et Al 2022
“Nearest Neighbor Non-autoregressive Text Generation”, 2022-08-26 ( ; similar)
“A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning”, Et Al 2022
“A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning”, 2022-08-23 ( ; similar)
“Improved Policy Optimization for Online Imitation Learning”, Et Al 2022
“Improved Policy Optimization for Online Imitation Learning”, 2022-07-29 ( ; similar)
“Fine-grained Image Captioning With CLIP Reward”, Et Al 2022
“Fine-grained Image Captioning with CLIP Reward”, 2022-05-26 ( ; similar)
“Is Vanilla Policy Gradient Overlooked? Analyzing Deep Reinforcement Learning for Hanabi”, Et Al 2022
“Is Vanilla Policy Gradient Overlooked? Analyzing Deep Reinforcement Learning for Hanabi”, 2022-03-22 ( ; similar)
“Quantifying and Alleviating Political Bias in Language Models”, Et Al 2022
“Quantifying and alleviating political bias in language models”, 2022-03-01 ( ; similar; bibliography)
“Machine Learning Helps Control Tokamak Plasmas”, 2022
“Machine learning helps control tokamak plasmas”, 2022-02-18 (backlinks)
“Retrieval-Augmented Reinforcement Learning”, Et Al 2022
“Retrieval-Augmented Reinforcement Learning”, 2022-02-17 ( ; similar)
“Magnetic Control of Tokamak Plasmas through Deep Reinforcement Learning”, Et Al 2022
“Magnetic control of tokamak plasmas through deep reinforcement learning”, 2022-02-16 ( ; similar; bibliography)
“A Data-driven Approach for Learning to Control Computers”, Et Al 2022
“A data-driven approach for learning to control computers”, 2022-02-16 ( ; similar)
“Policy Learning and Evaluation With Randomized Quasi-Monte Carlo”, Et Al 2022
“Policy Learning and Evaluation with Randomized Quasi-Monte Carlo”, 2022-02-16 ( ; backlinks; similar)
“Why Should I Trust You, Bellman? The Bellman Error Is a Poor Replacement for Value Error”, Et Al 2022
“Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error”, 2022-01-28 (similar)
“Learning Dynamics and Generalization in Deep Reinforcement Learning”, Et Al 2022
“Learning Dynamics and Generalization in Deep Reinforcement Learning”, 2022 (similar; bibliography)
“Deep Reinforcement Learning Policies Learn Shared Adversarial Features Across MDPs”, 2021
“Deep Reinforcement Learning Policies Learn Shared Adversarial Features Across MDPs”, 2021-12-16 ( ; similar)
“Amortized Noisy Channel Neural Machine Translation”, Et Al 2021
“Amortized Noisy Channel Neural Machine Translation”, 2021-12-16 ( ; similar)
“Simple but Effective: CLIP Embeddings for Embodied AI”, Et Al 2021
“Simple but Effective: CLIP Embeddings for Embodied AI”, 2021-11-18 ( ; similar)
“Recurrent Model-Free RL Is a Strong Baseline for Many POMDPs”, Et Al 2021
“Recurrent Model-Free RL is a Strong Baseline for Many POMDPs”, 2021-10-11 ( ; backlinks; similar)
“DroQ: Dropout Q-Functions for Doubly Efficient Reinforcement Learning”, Et Al 2021
“DroQ: Dropout Q-Functions for Doubly Efficient Reinforcement Learning”, 2021-10-05 (similar)
“Batch Size-invariance for Policy Optimization”, Et Al 2021
“Batch size-invariance for policy optimization”, 2021-10-01 (similar)
“MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research”, Et Al 2021
“MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research”, 2021-09-27 ( ; similar)
“Bootstrapped Meta-Learning”, Et Al 2021
“Bootstrapped Meta-Learning”, 2021-09-09 ( ; similar)
“Megaverse: Simulating Embodied Agents at One Million Experiences per Second”, Et Al 2021
“Megaverse: Simulating Embodied Agents at One Million Experiences per Second”, 2021-07-17 ( ; similar)
“Unbiased Gradient Estimation in Unrolled Computation Graphs With Persistent Evolution Strategies”, Et Al 2021
“Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies”, 2021-07-01 ( ; similar; bibliography)
“Multi-task Curriculum Learning in a Complex, Visual, Hard-exploration Domain: Minecraft”, Et Al 2021
“Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft”, 2021-06-28 ( ; similar)
“On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning”, Et Al 2021
“On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning”, 2021-05-04 ( ; similar)
“Podracer Architectures for Scalable Reinforcement Learning”, Et Al 2021
“Podracer architectures for scalable Reinforcement Learning”, 2021-04-13 ( ; similar; bibliography)
“Muesli: Combining Improvements in Policy Optimization”, Et Al 2021
“Muesli: Combining Improvements in Policy Optimization”, 2021-04-13 ( ; backlinks; similar)
“Counter-Strike Deathmatch With Large-Scale Behavioural Cloning”, 2021
“Counter-Strike Deathmatch with Large-Scale Behavioural Cloning”, 2021-04-09 (similar)
“Efficient Transformers in Reinforcement Learning Using Actor-Learner Distillation”, 2021
“Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation”, 2021-04-04 ( ; backlinks; similar)
“Large Batch Simulation for Deep Reinforcement Learning”, Et Al 2021
“Large Batch Simulation for Deep Reinforcement Learning”, 2021-03-12 ( ; backlinks; similar)
“Reinforcement Learning for Datacenter Congestion Control”, Et Al 2021
“Reinforcement Learning for Datacenter Congestion Control”, 2021-02-18 ( ; similar)
“How RL Agents Behave When Their Actions Are Modified”, 2021
“How RL Agents Behave When Their Actions Are Modified”, 2021-02-15 (similar)
“Randomized Ensembled Double Q-Learning (REDQ): Learning Fast Without a Model”, Et Al 2021
“Randomized Ensembled Double Q-Learning (REDQ): Learning Fast Without a Model”, 2021-01-15 (backlinks; similar)
“MLGO: a Machine Learning Guided Compiler Optimizations Framework”, Et Al 2021
“MLGO: a Machine Learning Guided Compiler Optimizations Framework”, 2021-01-13 ( ; similar)
“Evolving Reinforcement Learning Algorithms”, Co-Et Al 2021
“Evolving Reinforcement Learning Algorithms”, 2021-01-08 ( ; similar)
“Using Deep Reinforcement Learning to Reveal How the Brain Encodes Abstract State-space Representations in High-dimensional Environments”, 2020
“Using deep reinforcement learning to reveal how the brain encodes abstract state-space representations in high-dimensional environments”, 2020-12-15 (similar)
“Autonomous Navigation of Stratospheric Balloons Using Reinforcement Learning”, Et Al 2020
“Autonomous navigation of stratospheric balloons using reinforcement learning”, 2020-12-02 ( ; similar)
“Offline Learning from Demonstrations and Unlabeled Experience”, Et Al 2020
“Offline Learning from Demonstrations and Unlabeled Experience”, 2020-11-27 (similar)
“A Unified Framework for Dopamine Signals across Timescales”, Et Al 2020
“A Unified Framework for Dopamine Signals across Timescales”, 2020-11-27 ( ; similar)
“Adversarial Vulnerabilities of Human Decision-making”, Et Al 2020
“Adversarial vulnerabilities of human decision-making”, 2020-11-04 ( ; similar)
“Human-centric Dialog Training via Offline Reinforcement Learning”, Et Al 2020
“Human-centric Dialog Training via Offline Reinforcement Learning”, 2020-10-12 ( ; similar)
“Emergent Social Learning via Multi-agent Reinforcement Learning”, Et Al 2020
“Emergent Social Learning via Multi-agent Reinforcement Learning”, 2020-10-01 ( ; similar)
“Super-Human Performance In Gran Turismo Sport Using Deep Reinforcement Learning”, Et Al 2020
“Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning”, 2020-08-18 (similar)
“Improving GAN Training With Probability Ratio Clipping and Sample Reweighting”, Et Al 2020
“Improving GAN Training with Probability Ratio Clipping and Sample Reweighting”, 2020-06-12 ( ; backlinks; similar)
“Conservative Q-Learning for Offline Reinforcement Learning”, Et Al 2020
“Conservative Q-Learning for Offline Reinforcement Learning”, 2020-06-08 (similar)
“Controlling Overestimation Bias With Truncated Mixture of Continuous Distributional Quantile Critics (TQC)”, Et Al 2020
“Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics (TQC)”, 2020-05-08 (similar)
“Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels”, Et Al 2020
“Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels”, 2020-04-28 (similar; bibliography)
“Evaluating the Rainbow DQN Agent in Hanabi With Unseen Partners”, Et Al 2020
“Evaluating the Rainbow DQN Agent in Hanabi with Unseen Partners”, 2020-04-28 ( ; similar)
“Chip Placement With Deep Reinforcement Learning”, Et Al 2020
“Chip Placement with Deep Reinforcement Learning”, 2020-04-22 ( ; similar)
“CURL: Contrastive Unsupervised Representations for Reinforcement Learning”, Et Al 2020
“CURL: Contrastive Unsupervised Representations for Reinforcement Learning”, 2020-04-08 (similar)
“Evolving Normalization-Activation Layers”, Et Al 2020
“Evolving Normalization-Activation Layers”, 2020-04-06 ( ; similar)
“Benchmarking End-to-End Behavioural Cloning on Video Games”, Et Al 2020
“Benchmarking End-to-End Behavioural Cloning on Video Games”, 2020-04-02 (backlinks; similar)
“Agent57: Outperforming the Atari Human Benchmark”, Et Al 2020
“Agent57: Outperforming the Atari Human Benchmark”, 2020-03-30 ( ; similar)
“Deep Neuroethology of a Virtual Rodent”, Et Al 2020
“Deep neuroethology of a virtual rodent”, 2020-03-11 ( ; similar; bibliography)
“Causal Evidence Supporting the Proposal That Dopamine Transients Function As Temporal Difference Prediction Errors”, Et Al 2020
“A Distributional Code for Value in Dopamine-based Reinforcement Learning”, Et Al 2020
“SEED RL: Scalable and Efficient Deep-RL With Accelerated Central Inference”, Et Al 2019
“SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference”, 2019-10-15 ( ; similar; bibliography)
“QUARL: Quantized Reinforcement Learning (ActorQ)”, Et Al 2019
“QUARL: Quantized Reinforcement Learning (ActorQ)”, 2019-10-02 ( ; similar; bibliography)
“Deep Learning Enables Rapid Identification of Potent DDR1 Kinase Inhibitors”, Et Al 2019
“Is Deep Reinforcement Learning Really Superhuman on Atari? Leveling the Playing Field”, Et Al 2019
“Is Deep Reinforcement Learning Really Superhuman on Atari? Leveling the playing field”, 2019-08-13 (backlinks; similar)
“A View on Deep Reinforcement Learning in System Optimization”, Haj-Et Al 2019
“A View on Deep Reinforcement Learning in System Optimization”, 2019-08-04 ( ; backlinks; similar)
“Playing the Lottery With Rewards and Multiple Languages: Lottery Tickets in RL and NLP”, Et Al 2019
“Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP”, 2019-06-06 ( ; similar)
“Universal Quantum Control through Deep Reinforcement Learning”, Et Al 2019
“Universal quantum control through deep reinforcement learning”, 2019-04-23 ( ; backlinks; similar)
“Reinforcement Learning for Recommender Systems: A Case Study on Youtube”, 2019
“Reinforcement Learning for Recommender Systems: A Case Study on Youtube”, 2019-03-28 ( ; similar)
“Benchmarking Classic and Learned Navigation in Complex 3D Environments”, Et Al 2019
“Benchmarking Classic and Learned Navigation in Complex 3D Environments”, 2019-01-30 ( ; similar)
“AutoPhase: Compiler Phase-Ordering for High Level Synthesis With Deep Reinforcement Learning”, Haj-Et Al 2019
“AutoPhase: Compiler Phase-Ordering for High Level Synthesis with Deep Reinforcement Learning”, 2019-01-15 ( ; backlinks; similar)
“Designing Neural Networks through Neuroevolution”, Et Al 2019
“Reinforcement Learning in Artificial and Biological Systems”, 2019
“Anxiety, Depression, and Decision Making: A Computational Perspective”, 2019
“Anxiety, Depression, and Decision Making: A Computational Perspective”, 2019 ( ; similar)
“IRLAS: Inverse Reinforcement Learning for Architecture Search”, Et Al 2018
“IRLAS: Inverse Reinforcement Learning for Architecture Search”, 2018-12-13 (backlinks; similar)
“Top-K Off-Policy Correction for a REINFORCE Recommender System”, Et Al 2018
“Top-K Off-Policy Correction for a REINFORCE Recommender System”, 2018-12-06 ( ; similar)
“Quantifying Generalization in Reinforcement Learning”, Et Al 2018
“Quantifying Generalization in Reinforcement Learning”, 2018-12-06 ( ; similar)
“Relative Entropy Regularized Policy Iteration”, Et Al 2018
“Relative Entropy Regularized Policy Iteration”, 2018-12-05 (similar)
“ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware”, Et Al 2018
“ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware”, 2018-12-02 ( ; backlinks; similar)
“Neural Probabilistic Motor Primitives for Humanoid Control”, Et Al 2018
“Neural probabilistic motor primitives for humanoid control”, 2018-11-28 ( ; similar)
“InstaNAS: Instance-aware Neural Architecture Search”, Et Al 2018
“InstaNAS: Instance-aware Neural Architecture Search”, 2018-11-26 (backlinks; similar)
“A Closer Look at Deep Policy Gradients”, Et Al 2018
“A Closer Look at Deep Policy Gradients”, 2018-11-06 (similar)
“One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets With RL”, Et Al 2018
“One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL”, 2018-10-11 ( ; similar)
“Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow”, Et Al 2018
“Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow”, 2018-10-01 ( ; backlinks; similar)
“Learning to Perform Local Rewriting for Combinatorial Optimization”, 2018
“Learning to Perform Local Rewriting for Combinatorial Optimization”, 2018-09-30 ( ; backlinks; similar)
“R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning”, Et Al 2018
“R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning”, 2018-09-27 ( ; similar; bibliography)
“Benchmarking Reinforcement Learning Algorithms on Real-World Robots”, Et Al 2018
“Benchmarking Reinforcement Learning Algorithms on Real-World Robots”, 2018-09-20 ( ; similar)
“Deterministic Implementations for Reproducibility in Deep Reinforcement Learning”, Et Al 2018
“Deterministic Implementations for Reproducibility in Deep Reinforcement Learning”, 2018-09-15 ( ; backlinks; similar)
“Multi-task Deep Reinforcement Learning With PopArt”, Et Al 2018
“Multi-task Deep Reinforcement Learning with PopArt”, 2018-09-12 (similar)
“Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction”, 2018
“Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction”, 2018-09-05 ( ; backlinks; similar)
“Searching Toward Pareto-Optimal Device-Aware Neural Architectures”, Et Al 2018
“Searching Toward Pareto-Optimal Device-Aware Neural Architectures”, 2018-08-29 ( ; backlinks; similar)
“A Study of Reinforcement Learning for Neural Machine Translation”, Et Al 2018
“A Study of Reinforcement Learning for Neural Machine Translation”, 2018-08-27 ( ; backlinks; similar)
“Improving Abstraction in Text Summarization”, Et Al 2018
“Improving Abstraction in Text Summarization”, 2018-08-23 (backlinks; similar)
“Learning to Optimize Join Queries With Deep Reinforcement Learning”, Et Al 2018
“Learning to Optimize Join Queries With Deep Reinforcement Learning”, 2018-08-09 ( ; backlinks; similar)
“Is Q-learning Provably Efficient?”, Et Al 2018
“Is Q-learning Provably Efficient?”, 2018-07-10 ( ; similar)
“InfoNCE: Representation Learning With Contrastive Predictive Coding (CPC)”, Et Al 2018
“InfoNCE: Representation Learning with Contrastive Predictive Coding (CPC)”, 2018-07-10 ( ; similar)
“Maximum a Posteriori Policy Optimisation”, Et Al 2018
“Maximum a Posteriori Policy Optimisation”, 2018-06-14 (similar)
“Resource-Efficient Neural Architect”, Et Al 2018
“Resource-Efficient Neural Architect”, 2018-06-12 (backlinks; similar)
“The Unusual Effectiveness of Averaging in GAN Training”, Et Al 2018
“The Unusual Effectiveness of Averaging in GAN Training”, 2018-06-12 ( ; backlinks; similar; bibliography)
“Playing Atari With Six Neurons”, Et Al 2018
“Playing Atari with Six Neurons”, 2018-06-04 ( ; similar)
“Measuring the Intrinsic Dimension of Objective Landscapes”, Et Al 2018
“Measuring the Intrinsic Dimension of Objective Landscapes”, 2018-04-24 ( ; similar)
“DP4G: Distributed Distributional Deterministic Policy Gradients”, Barth-Et Al 2018
“DP4G: Distributed Distributional Deterministic Policy Gradients”, 2018-04-23 (similar)
“Optimizing Query Evaluations Using Reinforcement Learning for Web Search”, Et Al 2018
“Optimizing Query Evaluations using Reinforcement Learning for Web Search”, 2018-04-12 ( ; backlinks; similar)
“Delayed Impact of Fair Machine Learning”, Et Al 2018
“Delayed Impact of Fair Machine Learning”, 2018-03-12 ( ; backlinks; similar)
“Learning Memory Access Patterns”, Et Al 2018
“Learning Memory Access Patterns”, 2018-03-06 ( ; backlinks; similar)
“Model-Ensemble Trust-Region Policy Optimization”, Et Al 2018
“Model-Ensemble Trust-Region Policy Optimization”, 2018-02-28 (similar)
“TD3: Addressing Function Approximation Error in Actor-Critic Methods”, Et Al 2018
“TD3: Addressing Function Approximation Error in Actor-Critic Methods”, 2018-02-26 (backlinks; similar)
“Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration”, Et Al 2018
“Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration”, 2018-02-24 ( ; backlinks; similar)
“Unicorn: Continual Learning With a Universal, Off-policy Agent”, Et Al 2018
“Unicorn: Continual Learning with a Universal, Off-policy Agent”, 2018-02-22 (similar)
“Efficient Neural Architecture Search via Parameter Sharing”, Et Al 2018
“Efficient Neural Architecture Search via Parameter Sharing”, 2018-02-09 (similar)
“IMPALA: Scalable Distributed Deep-RL With Importance Weighted Actor-Learner Architectures”, Et Al 2018
“IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures”, 2018-02-05 ( ; similar)
“Regularized Evolution for Image Classifier Architecture Search”, Et Al 2018
“Regularized Evolution for Image Classifier Architecture Search”, 2018-02-05 (backlinks; similar)
“Interactive Grounded Language Acquisition and Generalization in a 2D World”, Et Al 2018
“Interactive Grounded Language Acquisition and Generalization in a 2D World”, 2018-01-31 ( ; similar)
“Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning With a Stochastic Actor”, Et Al 2018
“Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor”, 2018-01-04 (backlinks; similar)
“Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning”, Et Al 2017
“Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning”, 2017-12-18 ( ; similar; bibliography)
“The Case for Learned Index Structures”, Et Al 2017
“The Case for Learned Index Structures”, 2017-12-04 ( ; backlinks; similar)
“AI Safety Gridworlds”, Et Al 2017
“AI Safety Gridworlds”, 2017-11-27 ( ; similar)
“Classification With Costly Features Using Deep Reinforcement Learning”, Et Al 2017
“Classification with Costly Features using Deep Reinforcement Learning”, 2017-11-20 ( ; backlinks; similar)
“Towards the Use of Deep Reinforcement Learning With Global Policy For Query-based Extractive Summarisation”, 2017
“Towards the Use of Deep Reinforcement Learning with Global Policy For Query-based Extractive Summarisation”, 2017-11-10 (backlinks; similar)
“Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection”, 2017
“Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection”, 2017-11-10 (backlinks; similar)
“Gradient-free Policy Architecture Search and Adaptation”, Et Al 2017
“Gradient-free Policy Architecture Search and Adaptation”, 2017-10-16 (backlinks; similar)
“Swish: Searching for Activation Functions”, Et Al 2017
“Swish: Searching for Activation Functions”, 2017-10-16 ( ; similar)
“Rainbow: Combining Improvements in Deep Reinforcement Learning”, Et Al 2017
“Rainbow: Combining Improvements in Deep Reinforcement Learning”, 2017-10-06 (similar; bibliography)
“OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning”, Et Al 2017
“OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning”, 2017-09-20 ( ; backlinks; similar)
“Deep Reinforcement Learning That Matters”, Et Al 2017
“Deep Reinforcement Learning that Matters”, 2017-09-19 (similar)
“Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning”, Et Al 2017
“Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning”, 2017-08-31 ( ; backlinks; similar)
“The Successor Representation in Human Reinforcement Learning”, Et Al 2017
“The successor representation in human reinforcement learning”, 2017-08-28 ( ; similar)
“Practical Block-wise Neural Network Architecture Generation”, Et Al 2017
“Practical Block-wise Neural Network Architecture Generation”, 2017-08-18 (backlinks; similar)
“Learning Policies for Adaptive Tracking With Deep Feature Cascades”, Et Al 2017
“Learning Policies for Adaptive Tracking with Deep Feature Cascades”, 2017-08-09 (backlinks; similar)
“Reinforced Video Captioning With Entailment Rewards”, 2017
“Reinforced Video Captioning with Entailment Rewards”, 2017-08-07 ( ; backlinks; similar)
“A Distributional Perspective on Reinforcement Learning”, Et Al 2017
“A Distributional Perspective on Reinforcement Learning”, 2017-07-21 (similar)
“Trial without Error: Towards Safe Reinforcement Learning via Human Intervention”, Et Al 2017
“Trial without Error: Towards Safe Reinforcement Learning via Human Intervention”, 2017-07-17 (backlinks; similar)
“Tracking As Online Decision-Making: Learning a Policy from Streaming Videos With Reinforcement Learning”, III & 2017
“Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning”, 2017-07-17 ( ; backlinks; similar)
“Efficient Architecture Search by Network Transformation”, Et Al 2017
“Efficient Architecture Search by Network Transformation”, 2017-07-16 ( ; backlinks; similar)
“Grammatical Error Correction With Neural Reinforcement Learning”, Et Al 2017
“Grammatical Error Correction with Neural Reinforcement Learning”, 2017-07-02 (backlinks; similar)
“Noisy Networks for Exploration”, Et Al 2017
“Noisy Networks for Exploration”, 2017-06-30 ( ; similar)
“Gated-Attention Architectures for Task-Oriented Language Grounding”, Et Al 2017
“Gated-Attention Architectures for Task-Oriented Language Grounding”, 2017-06-22 (backlinks; similar)
“IDK Cascades: Fast Deep Learning by Learning Not to Overthink”, Et Al 2017
“IDK Cascades: Fast Deep Learning by Learning not to Overthink”, 2017-06-03 (backlinks; similar)
“Teaching Machines to Describe Images via Natural Language Feedback”, 2017
“Teaching Machines to Describe Images via Natural Language Feedback”, 2017-06-01 ( ; backlinks; similar)
“Learning Time/Memory-Efficient Deep Architectures With Budgeted Super Networks”, 2017
“Learning Time/Memory-Efficient Deep Architectures with Budgeted Super Networks”, 2017-05-31 (backlinks; similar)
“Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models”, Et Al 2017
“Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models”, 2017-05-30 ( ; backlinks; similar)
“Ask the Right Questions: Active Question Reformulation With Reinforcement Learning”, Et Al 2017
“Ask the Right Questions: Active Question Reformulation with Reinforcement Learning”, 2017-05-22 ( ; backlinks; similar)
“A Deep Reinforced Model for Abstractive Summarization”, Et Al 2017
“A Deep Reinforced Model for Abstractive Summarization”, 2017-05-11 ( ; backlinks; similar)
“Inferring and Executing Programs for Visual Reasoning”, Et Al 2017
“Inferring and Executing Programs for Visual Reasoning”, 2017-05-10 ( ; backlinks; similar)
“Time-Contrastive Networks: Self-Supervised Learning from Video”, Et Al 2017
“Time-Contrastive Networks: Self-Supervised Learning from Video”, 2017-04-23 ( ; backlinks; similar)
“RAM: Dynamic Computational Time for Visual Attention”, Et Al 2017
“RAM: Dynamic Computational Time for Visual Attention”, 2017-03-30 ( ; backlinks; similar)
“End-to-end Optimization of Goal-driven and Visually Grounded Dialogue Systems”, Et Al 2017
“End-to-end optimization of goal-driven and visually grounded dialogue systems”, 2017-03-15 (backlinks; similar)
“Improving Neural Machine Translation With Conditional Sequence Generative Adversarial Nets”, Et Al 2017
“Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets”, 2017-03-15 ( ; backlinks; similar)
“Neural Episodic Control”, Et Al 2017
“Neural Episodic Control”, 2017-03-06 ( ; similar)
“CoDeepNEAT: Evolving Deep Neural Networks”, Et Al 2017
“CoDeepNEAT: Evolving Deep Neural Networks”, 2017-03-01 ( ; backlinks; similar)
“Tuning Recurrent Neural Networks With Reinforcement Learning”, Et Al 2017
“Tuning Recurrent Neural Networks with Reinforcement Learning”, 2017-02-14 ( ; backlinks; similar)
“PathNet: Evolution Channels Gradient Descent in Super Neural Networks”, Et Al 2017
“PathNet: Evolution Channels Gradient Descent in Super Neural Networks”, 2017-01-30 (similar)
“The Kelly Coin-Flipping Game: Exact Solutions”, Et Al 2017
“The Kelly Coin-Flipping Game: Exact Solutions”, 2017-01-19 ( ; backlinks; similar; bibliography)
“Deep Reinforcement Learning: A Brief Survey”, Et Al 2017
“Loss Is Its Own Reward: Self-Supervision for Reinforcement Learning”, Et Al 2016
“Loss is its own Reward: Self-Supervision for Reinforcement Learning”, 2016-12-21 (similar)
“Overcoming Catastrophic Forgetting in Neural Networks”, Et Al 2016
“Overcoming catastrophic forgetting in neural networks”, 2016-12-02 (similar)
“Self-critical Sequence Training for Image Captioning”, Et Al 2016
“Self-critical Sequence Training for Image Captioning”, 2016-12-02 ( ; backlinks; similar; bibliography)
“Reinforcement Learning With Unsupervised Auxiliary Tasks”, Et Al 2016
“Reinforcement Learning with Unsupervised Auxiliary Tasks”, 2016-11-16 (similar)
“A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models”, Et Al 2016
“A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models”, 2016-11-11 ( ; backlinks; similar)
“Hybrid Computing Using a Neural Network With Dynamic External Memory”, Et Al 2016
“Hybrid computing using a neural network with dynamic external memory”, 2016-10-27 ( ; similar)
“Connecting Generative Adversarial Networks and Actor-Critic Methods”, 2016
“Connecting Generative Adversarial Networks and Actor-Critic Methods”, 2016-10-06 ( ; backlinks; similar)
“Deep Reinforcement Learning for Mention-Ranking Coreference Models”, 2016
“Deep Reinforcement Learning for Mention-Ranking Coreference Models”, 2016-09-27 (backlinks; similar)
“Deep Neural Networks for YouTube Recommendations”, Et Al 2016
“Deep Neural Networks for YouTube Recommendations”, 2016-09-15 ( ; similar)
“The Malmo Platform for Artificial Intelligence Experimentation”, Et Al 2016
“The Malmo Platform for Artificial Intelligence Experimentation”, 2016-07-01 (backlinks; similar)
“Progressive Neural Networks”, Et Al 2016
“Progressive Neural Networks”, 2016-06-15 (similar)
“Learning to Optimize”, 2016
“Learning to Optimize”, 2016-06-06 ( ; backlinks; similar)
“Deep Reinforcement Learning for Dialogue Generation”, Et Al 2016
“Deep Reinforcement Learning for Dialogue Generation”, 2016-06-05 ( ; backlinks; similar)
“ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning”, Et Al 2016
“ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning”, 2016-05-06 ( ; backlinks; similar)
“Improving Information Extraction by Acquiring External Evidence With Reinforcement Learning”, Et Al 2016
“Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning”, 2016-03-25 ( ; backlinks; similar)
“Asynchronous Methods for Deep Reinforcement Learning”, Et Al 2016
“Asynchronous Methods for Deep Reinforcement Learning”, 2016-02-04 (similar)
“Dueling Network Architectures for Deep Reinforcement Learning”, Et Al 2015
“Dueling Network Architectures for Deep Reinforcement Learning”, 2015-11-20 (similar)
“Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning”, Et Al 2015
“Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning”, 2015-11-19 ( ; similar)
“Prioritized Experience Replay”, Et Al 2015
“Prioritized Experience Replay”, 2015-11-18 (similar)
“Deep Reinforcement Learning With Double Q-learning”, Et Al 2015
“Deep Reinforcement Learning with Double Q-learning”, 2015-09-22 (similar)
“Gorila: Massively Parallel Methods for Deep Reinforcement Learning”, Et Al 2015
“Gorila: Massively Parallel Methods for Deep Reinforcement Learning”, 2015-07-15 ( ; similar)
“Reinforcement Learning Neural Turing Machines—Revised”, 2015
“Reinforcement Learning Neural Turing Machines—Revised”, 2015-05-04 ( ; backlinks; similar)
“An Invitation to Imitation”, 2015
“An Invitation to Imitation”, 2015-03-14 ( ; backlinks; similar)
“TRPO: Trust Region Policy Optimization”, Et Al 2015
“TRPO: Trust Region Policy Optimization”, 2015-02-19 (backlinks; similar)
“DRAW: A Recurrent Neural Network For Image Generation”, Et Al 2015
“DRAW: A Recurrent Neural Network For Image Generation”, 2015-02-16 ( ; backlinks; similar)
“Random Feedback Weights Support Learning in Deep Neural Networks”, Et Al 2014
“Random feedback weights support learning in deep neural networks”, 2014-11-02 ( ; backlinks; similar)
“Learning to Execute”, 2014
“Learning to Execute”, 2014-10-17 ( ; similar)
“Does Temporal Discounting Explain Unhealthy Behavior? A Systematic Review and Reinforcement Learning Perspective”, Et Al 2014
“Does temporal discounting explain unhealthy behavior? A systematic review and reinforcement learning perspective”, 2014 ( ; backlinks; similar)
“The Arcade Learning Environment: An Evaluation Platform for General Agents”, Et Al 2012
“The Arcade Learning Environment: An Evaluation Platform for General Agents”, 2012-07-19 (similar)
“Artist Agent: A Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting”, Et Al 2012
“Artist Agent: A Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting”, 2012-06-18 ( ; backlinks; similar)
“Neural Mechanisms of Speed-accuracy Tradeoff”, 2012
“Neural mechanisms of speed-accuracy tradeoff”, 2012 ( ; backlinks; similar)
“DAGGER: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning”, Et Al 2010
“DAGGER: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning”, 2010-11-02 ( ; backlinks; similar)
“Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal”, 2005
“Midbrain dopamine neurons encode a quantitative reward prediction error signal”, 2005 ( ; backlinks; similar)
“It Takes Two Neurons To Ride a Bicycle”, 2004
“It Takes Two Neurons To Ride a Bicycle”, 2004-12-13 (backlinks; similar; bibliography)
“Learning to Drive a Bicycle Using Reinforcement Learning and Shaping”, Randløv & 1998
“Learning to Drive a Bicycle Using Reinforcement Learning and Shaping”, 1998 (backlinks)
“Descriptor Predictive Control: Tracking Controllers for a Riderless Bicycle”, Et Al 1996
“Descriptor predictive control: Tracking controllers for a riderless bicycle”, 1996-07-09 (backlinks; similar)
“Control for an Autonomous Bicycle”, 1995
“Control for an autonomous bicycle”, 1995-05-21 (backlinks; similar)
“Simple Statistical Gradient-following Algorithms for Connectionist Reinforcement Learning”, 1992
“Simple statistical gradient-following algorithms for connectionist reinforcement learning”, 1992 (backlinks; similar)
“Experiments on the Mechanization of Game-learning Part II. Rule-Based Learning and the Human Window [BOXES]”, 1982
“Experiments on the Mechanization of Game-learning Part I. Characterization of the Model and Its Parameters [MENACE]”, 1963
“A Matchbox Game-Learning Machine”, 1962
Miscellaneous
Link Bibliography
-
https://arxiv.org/abs/2211.07638
: “Legged Locomotion in Challenging Terrains Using Egocentric Vision”, Ananye Agarwal, Ashish Kumar, Jitendra Malik, Deepak Pathak: -
https://arxiv.org/abs/2210.01542#twitter
: “Hyperbolic Deep Reinforcement Learning”, Edoardo Cetin, Benjamin Chamberlain, Michael Bronstein, Jonathan J. Hunt: -
https://arxiv.org/abs/2209.08466
: “Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies With One Objective (ALM)”, Raj Ghugare, Homanga Bharadhwaj, Benjamin Eysenbach, Sergey Levine, Ruslan Salakhutdinov: -
https://arxiv.org/abs/2209.07550#deepmind
: “Human-level Atari 200× Faster”, Steven Kapturowski, Víctor Campos, Ray Jiang, Nemanja Rakićević, Hado van Hasselt, Charles Blundell, Adrià Puigdomènech Badia: -
2022-liu.pdf
: “Quantifying and Alleviating Political Bias in Language Models”, Ruibo Liu, Chenyan Jia, Jason Wei, Guangxuan Xu, Soroush Vosoughi: -
https://www.nature.com/articles/s41586-021-04301-9https://www.nature.com/articles/s41586-021-04301-9#deepmind
: “Magnetic Control of Tokamak Plasmas through Deep Reinforcement Learning”, : -
https://proceedings.mlr.press/v162/lyle22a/lyle22a.pdf
: “Learning Dynamics and Generalization in Deep Reinforcement Learning”, Clare Lyle, Mark Rowland, Will Dabney, Marta Kwiatkowska, Yarin Gal: -
https://proceedings.mlr.press/v139/vicol21a.html
: “Unbiased Gradient Estimation in Unrolled Computation Graphs With Persistent Evolution Strategies”, Paul Vicol, Luke Metz, Jascha Sohl-Dickstein: -
https://arxiv.org/abs/2104.06272#deepmind
: “Podracer Architectures for Scalable Reinforcement Learning”, Matteo Hessel, Manuel Kroiss, Aidan Clark, Iurii Kemaev, John Quan, Thomas Keck, Fabio Viola, Hado van Hasselt: -
https://arxiv.org/abs/2004.13649
: “Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels”, Ilya Kostrikov, Denis Yarats, Rob Fergus: -
https://openreview.net/forum?id=SyxrxR4KPS#deepmind
: “Deep Neuroethology of a Virtual Rodent”, Josh Merel, Diego Aldarondo, Jesse Marshall, Yuval Tassa, Greg Wayne, Bence Olveczky (DM/Harvard): -
https://arxiv.org/abs/1910.06591#deepmind
: “SEED RL: Scalable and Efficient Deep-RL With Accelerated Central Inference”, Lasse Espeholt, Raphaël Marinier, Piotr Stanczyk, Ke Wang, Marcin Michalski: -
https://arxiv.org/abs/1910.01055#google
: “QUARL: Quantized Reinforcement Learning (ActorQ)”, : -
https://openreview.net/forum?id=r1lyTjAqYX#deepmind
: “R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning”, Steven Kapturowski, Georg Ostrovski, John Quan, Remi Munos, Will Dabney: -
https://arxiv.org/abs/1806.04498
: “The Unusual Effectiveness of Averaging in GAN Training”, Yasin Yazıcı, Chuan-Sheng Foo, Stefan Winkler, Kim-Hui Yap, Georgios Piliouras, Vijay Chandrasekhar: -
https://arxiv.org/abs/1712.06567#uber
: “Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning”, Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O. Stanley, Jeff Clune: -
https://arxiv.org/abs/1710.02298#deepmind
: “Rainbow: Combining Improvements in Deep Reinforcement Learning”, : -
coin-flip
: “The Kelly Coin-Flipping Game: Exact Solutions”, Gwern Branwen, Arthur Breitman, nshepperd, FeepingCreature, Gurkenglas: -
https://arxiv.org/abs/1612.00563
: “Self-critical Sequence Training for Image Captioning”, Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, Vaibhava Goel: -
2004-cook.pdf
: “It Takes Two Neurons To Ride a Bicycle”, Matthew Cook: