“Memorization Without Generalization in a Multilayered Neural Network”, 1992 ():
The supervised learning of a rule that can be realized by a multilayer neural network (the teacher) functioning as a parity machine with K = 2 hidden units and non-overlapping receptive fields is studied. The student network is supposed to have the same architecture as the teacher.
Application of statistical mechanics shows that when the number of examples is smaller than a critical value P✱, the trained network is unable to generalize the rule from the examples [phase transition].
Numerical simulations exhibiting this phenomenon are discussed.