Deep Double Descent: Where Bigger Models and More Data Hurt
Deep Residual Learning for Image Recognition
Deep Double Descent: We show that the double descent phenomenon occurs in CNNs, ResNets, and transformers: performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time
Understanding ‘Deep Double Descent’
Rethinking Bias-Variance Trade-off for Generalization of Neural Networks
https://x.com/francoisfleuret/status/1269301689095503872
https://arxiv.org/pdf/2105.14368.pdf#page=18
Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited
https://windowsontheory.org/2019/12/05/deep-double-descent/
Reconciling modern machine learning practice and the bias-variance trade-off
CIFAR-10 and CIFAR-100 Datasets