Deep Reinforcement Learning Without Experience Replay, Target Networks, or Batch Updates
SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
Training Language Models to Self-Correct via Reinforcement Learning
Mind Wandering During Implicit Learning Is Associated With Increased Periodic EEG Activity And Improved Extraction Of Hidden Probabilistic Patterns
Alexa Is in Millions of Households—and Amazon Is Losing Billions
Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning
Let Models Speak Ciphers: Multiagent Debate through Embeddings
Predictive auxiliary objectives in deep RL mimic learning in the brain
Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning
Comparative study of model-based and model-free reinforcement learning control performance in HVAC systems
What Are Dreams For? Converging lines of research suggest that we might be misunderstanding something we do every night of our lives
Low-Poly Image Generation Using Evolutionary Algorithms in Ruby
Using temperature to analyze the neural basis of a time-based decision
Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks
Twitching in Sensorimotor Development from Sleeping Rats to Robots
Lucy-SKG: Learning to Play Rocket League Efficiently Using Deep Reinforcement Learning
Improving Language Models with Advantage-based Offline Policy Gradients
WizardLM: Empowering Large Language Models to Follow Complex Instructions
Bridging Discrete and Backpropagation: Straight-Through and Beyond
A circuit mechanism linking past and future learning through shifts in perception
Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier
Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula
Legged Locomotion in Challenging Terrains using Egocentric Vision
Over-communicate no more: Situated RL agents learn concise communication protocols
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective (ALM)
A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning
Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter
Improved Policy Optimization for Online Imitation Learning
Offline RL for Natural Language Generation with Implicit Language Q Learning
Reward Bases: Instantaneous reward revaluation with temporal difference learning
Is Vanilla Policy Gradient Overlooked? Analyzing Deep Reinforcement Learning for Hanabi
Quantifying and alleviating political bias in language models
Policy Learning and Evaluation with Randomized Quasi-Monte Carlo
Magnetic control of tokamak plasmas through deep reinforcement learning
Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error
Learning Dynamics and Generalization in Deep Reinforcement Learning
Deep Reinforcement Learning Policies Learn Shared Adversarial Features Across MDPs
Offline Reinforcement Learning with Implicit Q-Learning (IQL)
Recurrent Model-Free RL is a Strong Baseline for Many POMDPs
DroQ: Dropout Q-Functions for Doubly Efficient Reinforcement Learning
MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research
Megaverse: Simulating Embodied Agents at One Million Experiences per Second
PES: Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies
Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft
On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning
Podracer architectures for scalable Reinforcement Learning
Counter-Strike Deathmatch with Large-Scale Behavioral Cloning
ALD: Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation
Replay in Deep Learning: Current Approaches and Missing Biological Elements
The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games
A✱ Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks
Randomized Ensembled Double Q-Learning (REDQ): Learning Fast Without a Model
MLGO: a Machine Learning Guided Compiler Optimizations Framework
Using deep reinforcement learning to reveal how the brain encodes abstract state-space representations in high-dimensional environments
Autonomous navigation of stratospheric balloons using reinforcement learning
A Unified Framework for Dopamine Signals across Timescales
Offline Learning from Demonstrations and Unlabeled Experience
Human-centric Dialog Training via Offline Reinforcement Learning
Emergent Social Learning via Multi-agent Reinforcement Learning
Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning
SPR: Data-Efficient Reinforcement Learning with Self-Predictive Representations
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks
Improving GAN Training with Probability Ratio Clipping and Sample Reweighting
Conservative Q-Learning for Offline Reinforcement Learning
Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics (TQC)
Evaluating the Rainbow DQN Agent in Hanabi with Unseen Partners
Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels
CURL: Contrastive Unsupervised Representations for Reinforcement Learning
Q✱ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison
Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?
Causal Evidence Supporting the Proposal That Dopamine Transients Function As Temporal Difference Prediction Errors
A Distributional Code for Value in Dopamine-Based Reinforcement Learning
Combining Q-Learning and Search with Amortized Value Estimates
SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference
Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?
Deep Learning Enables Rapid Identification of Potent DDR1 Kinase Inhibitors
Exponential slowdown for larger populations: The (μ+1)-EA on monotone functions
Is Deep Reinforcement Learning Really Superhuman on Atari? Leveling the playing field
A View on Deep Reinforcement Learning in System Optimization
Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP
A General Dichotomy of Evolutionary Algorithms on Monotone Functions
Universal quantum control through deep reinforcement learning
Reinforcement Learning for Recommender Systems: A Case Study on Youtube
Benchmarking Classic and Learned Navigation in Complex 3D Environments
AutoPhase: Compiler Phase-Ordering for High Level Synthesis with Deep Reinforcement Learning
Anxiety, Depression, and Decision Making: A Computational Perspective
Reinforcement Learning in Artificial and Biological Systems
IRLAS: Inverse Reinforcement Learning for Architecture Search
Top-K Off-Policy Correction for a REINFORCE Recommender System
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
Neural probabilistic motor primitives for humanoid control
One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL
Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow
Learning to Perform Local Rewriting for Combinatorial Optimization
R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning
Benchmarking Reinforcement Learning Algorithms on Real-World Robots
Deterministic Implementations for Reproducibility in Deep Reinforcement Learning
Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction
Searching Toward Pareto-Optimal Device-Aware Neural Architectures
A Study of Reinforcement Learning for Neural Machine Translation
Learning to Optimize Join Queries With Deep Reinforcement Learning
InfoNCE: Representation Learning with Contrastive Predictive Coding (CPC)
DP4G: Distributed Distributional Deterministic Policy Gradients
Optimizing Query Evaluations using Reinforcement Learning for Web Search
TD3: Addressing Function Approximation Error in Actor-Critic Methods
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Unicorn: Continual Learning with a Universal, Off-policy Agent
ENAS: Efficient Neural Architecture Search via Parameter Sharing
Regularized Evolution for Image Classifier Architecture Search
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
Interactive Grounded Language Acquisition and Generalization in a 2D World
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning
Classification with Costly Features using Deep Reinforcement Learning
Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection
Towards the Use of Deep Reinforcement Learning with Global Policy For Query-based Extractive Summarization
Rainbow: Combining Improvements in Deep Reinforcement Learning
OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning
Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents
Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning
The successor representation in human reinforcement learning
Practical Block-wise Neural Network Architecture Generation
Learning Policies for Adaptive Tracking with Deep Feature Cascades
Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning
Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
Grammatical Error Correction with Neural Reinforcement Learning
Gated-Attention Architectures for Task-Oriented Language Grounding
Deep reinforcement learning from human preferences § Appendix A.2: Atari
Towards Synthesizing Complex Programs from Input-Output Examples
IDK Cascades: Fast Deep Learning by Learning not to Overthink
Teaching Machines to Describe Images via Natural Language Feedback
Learning Time/Memory-Efficient Deep Architectures with Budgeted Super Networks
Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models
Ask the Right Questions: Active Question Reformulation with Reinforcement Learning
Time-Contrastive Networks: Self-Supervised Learning from Video
Born to Learn: the Inspiration, Progress, and Future of Evolved Plastic Artificial Neural Networks (EPANNs)
Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets
End-to-end optimization of goal-driven and visually grounded dialogue systems
Tuning Recurrent Neural Networks with Reinforcement Learning
PathNet: Evolution Channels Gradient Descent in Super Neural Networks
Your TL;DR by an AI: A Deep Reinforced Model for Abstractive Summarization
Loss is its own Reward: Self-Supervision for Reinforcement Learning
Neural Combinatorial Optimization with Reinforcement Learning
A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models
Hybrid computing using a neural network with dynamic external memory
Connecting Generative Adversarial Networks and Actor-Critic Methods
Deep Reinforcement Learning for Mention-Ranking Coreference Models
The Malmo Platform for Artificial Intelligence Experimentation
ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning
Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement Learning
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning
Gorila: Massively Parallel Methods for Deep Reinforcement Learning
Random feedback weights support learning in deep neural networks
Does temporal discounting explain unhealthy behavior? A systematic review and reinforcement learning perspective
The Arcade Learning Environment: An Evaluation Platform for General Agents
Artist Agent: A Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting
DAgger: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
Compositional pattern producing networks: A novel abstraction of development
Midbrain dopamine neurons encode a quantitative reward prediction error signal
Recent Developments in the Evolution of Morphologies and Controllers for Physically Simulated Creatures § A Re-implementation of Sims’ Work Using the MathEngine Physics Engine
Learning to Drive a Bicycle Using Reinforcement Learning and Shaping
Descriptor predictive control: Tracking controllers for a riderless bicycle
Simple statistical gradient-following algorithms for connectionist reinforcement learning
Proceedings of the First International Conference on Genetic Algorithms and Their Applications
Experiments on the Mechanization of Game-Learning Part II. Rule-Based Learning and the Human Window [BOXES]
Experiments on the Mechanization of Game-Learning Part I. Characterization of the Model and Its Parameters [MENACE]
Some Studies in Machine Learning Using the Game of Checkers
Sutton & Barto Book: Reinforcement Learning: An Introduction
Trackmania I—The History of Machine Learning in Trackmania
The 37 Implementation Details of Proximal Policy Optimization
Microsoft and Meta Join Google in Using AI to Help Run Their Data Centers
Sony’s Racing Car AI Just Destroyed Its Human Competitors—By Being Nice (and Fast)
Target-Driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning [Video]
Measuring the Intrinsic Dimension of Objective Landscapes [Video]
2001-cook-figure2-chaoticdynamicsofunsteeredvirtualbicycleover800runs.png
http://vision.psych.umn.edu/groups/schraterlab/dearden98bayesian.pdf
https://github.com/curiousjp/toy_sd_genetics?tab=readme-ov-file#toy_sd_genetics
https://github.com/deepmind/acme/tree/master/acme/agents/tf/dmpo
https://journals.sagepub.com/doi/10.1177/17456916231204811
https://research.google/blog/quantization-for-fast-and-environmentally-sustainable-reinforcement-learning/
https://web.archive.org/web/20140918110745/http://friggeri.net/blog/a-genetic-approach-to-css-compression/
https://www.lesswrong.com/posts/DKtWikjcdApRj3rWr/paper-understanding-and-controlling-a-maze-solving-policy
https://www.lesswrong.com/posts/S54HKhxQyttNLATKu/deconfusing-direct-vs-amortised-optimization
https://www.quantamagazine.org/memories-help-brains-recognize-new-events-worth-remembering-20230517/
https://www.reddit.com/r/MachineLearning/comments/18eh2hb/p_the_power_of_reinforcement_learning_look_how/
https://www.reddit.com/r/MachineLearning/comments/1anv7n4/p_ai_learns_pvp_in_old_school_runescape/
Deep Reinforcement Learning Without Experience Replay, Target Networks, or Batch Updates
https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DyqQJGTDGXN.html
https%253A%252F%252Farxiv.org%252Fabs%252F2310.03882%2523deepmind.html
Comparative study of model-based and model-free reinforcement learning control performance in HVAC systems
%252Fdoc%252Freinforcement-learning%252Fmodel%252F2023-gao.pdf.html
Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks
WizardLM: Empowering Large Language Models to Follow Complex Instructions
Legged Locomotion in Challenging Terrains using Egocentric Vision
https%253A%252F%252Farxiv.org%252Fabs%252F2210.01542%2523twitter.html
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective (ALM)
https%253A%252F%252Farxiv.org%252Fabs%252F2209.07550%2523deepmind.html
Quantifying and alleviating political bias in language models
%252Fdoc%252Fai%252Fnn%252Ftransformer%252Fgpt%252F2%252F2022-liu-3.pdf.html
Magnetic control of tokamak plasmas through deep reinforcement learning
https%253A%252F%252Fwww.nature.com%252Farticles%252Fs41586-021-04301-9%2523deepmind.html
Learning Dynamics and Generalization in Deep Reinforcement Learning
https%253A%252F%252Fproceedings.mlr.press%252Fv162%252Flyle22a%252Flyle22a.pdf.html
PES: Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies
https%253A%252F%252Fproceedings.mlr.press%252Fv139%252Fvicol21a.html.html
Podracer architectures for scalable Reinforcement Learning
https%253A%252F%252Farxiv.org%252Fabs%252F2104.06272%2523deepmind.html
The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games
Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels
https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253DSyxrxR4KPS%2523deepmind.html
Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?
SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference
https%253A%252F%252Farxiv.org%252Fabs%252F1910.06591%2523deepmind.html
https%253A%252F%252Farxiv.org%252Fabs%252F1910.01055%2523google.html
https%253A%252F%252Fkarpathy.github.io%252F2019%252F04%252F25%252Frecipe%252F.html
R2D2: Recurrent Experience Replay in Distributed Reinforcement Learning
https%253A%252F%252Fopenreview.net%252Fforum%253Fid%253Dr1lyTjAqYX%2523deepmind.html
Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning
Jeff Clune—Professor—Computer Science—University of British Columbia
https%253A%252F%252Farxiv.org%252Fabs%252F1712.06567%2523uber.html
Rainbow: Combining Improvements in Deep Reinforcement Learning
https%253A%252F%252Farxiv.org%252Fabs%252F1710.02298%2523deepmind.html
https%253A%252F%252Fwww.sciencedirect.com%252Fscience%252Farticle%252Fpii%252FS0896627317303653.html
%252Fdoc%252Freinforcement-learning%252Fmodel-free%252F2004-cook.pdf.html
Recent Developments in the Evolution of Morphologies and Controllers for Physically Simulated Creatures § A Re-implementation of Sims’ Work Using the MathEngine Physics Engine
Wikipedia Bibliography: