Alexis P. Wieland recently proposed a useful benchmark task for neural networks: distinguishing between two intertwined spirals [Swiss roll]. Although this task is easy to visualize, it is hard for a network to learn due to its extreme non-linearity. In this report we exhibit a network architecture that facilitates the learning of the spiral task, and then compare the learning speed of several variants of the back-propagation algorithm.
…Such a highly non-linear problem would clearly benefit from the computational power of many layers. Unfortunately, back-propagation learning generally slows down by an order of magnitude every time a layer is added to a network. This is because the error signal is attenuated each time it flows through a layer, and learning progress is therefore limited by the slow adaptation of units in the early layers of a multi-layer network. To avoid this problem, we used short-cut connections to provide direct information pathways to all parts of the network. Our connection pattern differs from the usual one in that each layer is connected to every succeeding layer, rather than just to its immediate successor. [see ResNets, DenseNets]
Freed from concerns of exponentially slow learning, we were able to use as many layers as we wanted. As a first guess, we tried 5 layers, meaning that the network contains an input layer, 3 hidden layers, and an output layer.
Figure 3: Network Architecture for the Spiral Problem.