Deep reinforcement learning (RL) algorithms have recently achieved remarkable successes in various sequential decision making tasks, leveraging advances in methods for training large deep networks. However, these methods usually require large amounts of training data, which is often a big problem for real-world applications. One natural question to ask is whether learning good representations for states and using larger networks helps in learning better policies.
In this paper, we try to study if increasing input dimensionality helps improve performance and sample efficiency of model-free deep RL algorithms. To do so, we propose an online feature extractor network (OFENet) that uses neural nets [using a DenseNet-style variant on MLPs] to produce good representations to be used as inputs to deep RL algorithms.
Even though the high dimensionality of input is usually supposed to make learning of RL agents more difficult, we show that the RL agents in fact learn more efficiently with the high-dimensional representation than with the lower-dimensional state observations. We believe that stronger feature propagation together with larger networks (and thus larger search space) allows RL agents to learn more complex functions of states and thus improves the sample efficiency.
Through numerical experiments, we show that the proposed method outperforms several other state-of-the-art algorithms in terms of both sample efficiency and performance.
Codes for the proposed method are available at merl.com.
âŚTo combine the advantages of deep layers and shallow layers, we use MLP-DenseNet, which is a slightly modified version of DenseNet (Huanget al2017), as the network architecture of OFENet. Each layer of MLP-DenseNet has an output y which is the concatenation of the input x and the product of a weight matrix W1 and x defined as:
y = [x, Ď(W1x)]
where [x1, x2] means concatenation, Ď is the activation function, and the biases are omitted to simplify notation. Since each layerâs output is contained in the next layerâs output, the raw input and the outputs of shallow layers are naturally contained in the final output.
The mappings Ďo, Ďo, a are represented with an MLP-DenseNet. The mapping Ďo receives the observation ot as input, and the mapping Ďo, a receives the concatenation of the observation representation zot and the action at as its input. Figure 2 shows an example of these mappings in the proposed OFENet.
Figure 1: The model network of ML-DDPG.FC represents a fully-connected layer, and concat represents a concatenation of its inputs.
Figure 2: An example of the online feature extractor.FC represents a fully connected layer with an activation function, and concat represents a concatenation of the inputs.