“A Jamming Transition from Under-Parameterization to Over-Parameterization Affects Loss Landscape and Generalization”, Stefano Spigler, Mario Geiger, StĂ©phane d’Ascoli, Levent Sagun, Giulio Biroli, Matthieu Wyart2018-10-22 (; backlinks)⁠:

We argue that in fully-connected networks a phase transition delimits the over- and under-parameterized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss.

In the whole over-parameterized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to hamper minimization.

Our findings support a link between this transition and the generalization properties of the network: as we increase the number of parameters of a given model, starting from an under-parameterized network, we observe that the generalization error displays 3 phases: (1) initial decay, (2) increase until the transition point—where it displays a cusp—and (3) slow decay toward a constant for the rest of the over-parameterized regime. Thereby we identify the region where the classical phenomenon of over-fitting takes place, and the region where the model keeps improving, in line with previous empirical observations for modern neural networks.