“Mastering Atari Games With Limited Data”, 2021-10-30 (; backlinks; similar):
Reinforcement learning has achieved great success in many applications. However, sample efficiency remains a key challenge, with prominent methods requiring millions (or even billions) of environment steps to train. Recently, there has been progress in sample efficient image-based RL algorithms; however, consistent human-level performance on the Atari game benchmark remains an elusive goal. We propose a sample efficient model-based visual RL algorithm built on MuZero, which we name Efficient Zero. Our method achieves 190.4% mean human performance and 116.0% median performance on the Atari 100k benchmark with only two hours of real-time game experience and outperforms the state SAC in some tasks on the DMControl 100k benchmark.
This is the first time an algorithm achieves super-human performance on Atari games with such little data. EfficientZero’s performance is also close to DQN’s performance at 200 million frames while we consume 500× less data.
EfficientZero’s low sample complexity and high performance can bring RL closer to real-world applicability.
We implement our algorithm in an easy-to-understand manner and it is available at https://github.com/YeWR/EfficientZero. We hope it will accelerate the research of MCTS-based RL algorithms in the wider community.
View PDF: