Bibliography (9):

  1. Deep Double Descent: We show that the double descent phenomenon occurs in CNNs, ResNets, and transformers: performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time

  2. https://arxiv.org/abs/1706.04454

  3. Visualizing the Loss Landscape of Neural Nets

  4. Essentially No Barriers in Neural Network Energy Landscape

  5. A jamming transition from under-parameterization to over-parameterization affects loss landscape and generalization

  6. https://arxiv.org/abs/1803.06969