âA Jamming Transition from Under-Parameterization to Over-Parameterization Affects Loss Landscape and Generalizationâ, 2018-10-22 (; backlinks)â :
We argue that in fully-connected networks a phase transition delimits the over- and under-parameterized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss.
In the whole over-parameterized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to hamper minimization.
Our findings support a link between this transition and the generalization properties of the network: as we increase the number of parameters of a given model, starting from an under-parameterized network, we observe that the generalization error displays 3 phases: (1) initial decay, (2) increase until the transition pointâwhere it displays a cuspâand (3) slow decay toward a constant for the rest of the over-parameterized regime. Thereby we identify the region where the classical phenomenon of over-fitting takes place, and the region where the model keeps improving, in line with previous empirical observations for modern neural networks.