“Untersuchungen Zu Dynamischen Neuronalen Netzen [Studies of Dynamic Neural Networks]”, 1991-06-15 (; backlinks):
[GPT-3 translation of German abstract]
Since the seminal article by Williams, Hinton, and Rumelhart [RHW86], backpropagation (BP) has become very popular as a learning method for neural networks with and without feedback. In contrast to many other learning methods for neural networks, BP takes into account the network structure and improves the network on the basis of this knowledge.
Since a very remote past input has to influence the present output, if it is randomly selected, this input is very unlikely to influence the present state of the network. Hence BP algorithms do not detect the fact that this input is responsible for the output desired. Therefore, BP algorithms are very hard to train a network to remember an input until it is needed to produce a later output. Moreover, the public BP algorithms take a very long time to compute.
In many cases, though, one needs an input sequence, as in Mozer [Moz90], which learns to compose music, where musical pieces are repeated and later note pitches are determined by previous note pitches. Steers a vehicle in a labyrinth, and the network obtains the error information only if the vehicle is in a dead end, so back-propagated errors are needed. If a neural network controls a robot that performs a task, perhaps some preparatory tasks are necessary whose performance the system should remember.
In this work, we investigate how to approach the problem of the long learning time associated with network inputs that are used later to control a desired output. This can be done either by means of the network architecture or by using the structure of the input sequences. In Chapter 4, a network is built so that inputs that are received at long delays are considered better than in the usual network architecture. Here, ‘storage nodes’ are introduced, which can carry information about an arbitrarily long time interval. The shortening of the input sequences, while retaining all relevant information, is investigated in Chapter 3. When a shortened input sequence must be recognized within this not so far back into the past, to recognize the relevant inputs. In Chapter 1 the used BP-Learn algorithms are presented, which are then in Chapter 2 analyzed to determine the cause of the long learning time to learn to store past inputs. To the algorithms it should be said that in some cases these were slightly modified to save computational time. The problem of resource acquisition occurring in Chapter 3 and 4 methods is addressed in Chapter 5.
The described experiments were performed on Sparc-based SUN stations. Due to time-resource constraints, algorithm comparison tests could not be carried out in the desired extent. There were trials that ran for up to a week on these machines, but other processes with higher priority were also running on these machines.
The definitions and notations in this work are not those commonly used in studies of neural networks, but they are introduced here only for this work. The reason is that there are no uniform, fundamental definitions for neural networks on which other authors would have based their work. Therefore, it is not guaranteed that there are no inconsistencies in the definitions and notations with other works.
These results were not all mathematically proven, as the work does not claim to be a mathematical analysis of neural networks. It is also difficult to find simple mathematical formalisms for neural networks. The work will rather describe ideas and approaches to see if it is possible to get a better grip on the problem of the long learning time for important previous inputs.
Besides the methods described here for learning in non-static environments, there is also the approach of the “Adaptive Critic”, as described in [Sch90a] and [“Recurrent networks adjusted by adaptive critics”]. The approach of “fast weights” by Schmidhuber [Sch91b] founds a storage function, although with a completely different approach than in Chapter 4, where a storage is also constructed.