Deep Double Descent: We show that the double descent phenomenon occurs in CNNs, ResNets, and transformers: performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time
https://arxiv.org/abs/1706.04454
Visualizing the Loss Landscape of Neural Nets
Essentially No Barriers in Neural Network Energy Landscape
A jamming transition from under-parameterization to over-parameterization affects loss landscape and generalization
https://arxiv.org/abs/1803.06969