“Exploration Strategies in Deep Reinforcement Learning”, 2020-06-07 (; similar):
Exploitation versus exploration is a critical topic in Reinforcement Learning. We’d like the RL agent to find the best solution as fast as possible. However, in the meantime, committing to solutions too quickly without enough exploration sounds pretty bad, as it could lead to local minima or total failure. Modern RL algorithms that optimize for the best returns can achieve good exploitation quite efficiently, while exploration remains more like an open topic.
I would like to discuss several common exploration strategies in Deep RL here. As this is a very big topic, my post by no means can cover all the important subtopics. I plan to update it periodically and keep further enriching the content gradually in time.
Classic Exploration Strategies
Key Exploration Problems
The Hard-Exploration Problem
The Noisy-TV Problem
Intrinsic Rewards as Exploration Bonuses
Count-based Exploration
Counting by Density Model
Counting after Hashing
Prediction-based Exploration
Forward Dynamics
Random Networks
Physical Properties
Memory-based Exploration
Episodic Memory
Direct Exploration
Q-Value Exploration
Variational Options
View HTML (17MB):