“Playing Atari With Deep Reinforcement Learning”, 2013-12-19 ():
We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning.
The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards.
We apply our method to 7 Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on 6 of the games and surpasses a human expert on 3 of them.