“Gorila: Massively Parallel Methods for Deep Reinforcement Learning”, 2015-07-15 (; similar):
We present the first massively distributed architecture for deep reinforcement learning, Gorila.
This architecture uses four main components: parallel actors that generate new behavior; parallel learners that are trained from stored experience; a distributed neural network to represent the value function or behavior policy; and a distributed store of experience. We used our architecture to implement the Deep Q-Network algorithm (DQN).
Our distributed algorithm was applied to 49 games from Atari 2600 games from the Arcade Learning Environment, using identical hyperparameters. Our performance surpassed non-distributed DQN in 41 of the 49 games and also reduced the wall-time required to achieve these results by an order of magnitude on most games.