"Limitations of the NTK for Understanding Generalization in Deep Learning" Neural tangent kernels scale worse with dataset size than the actual neural network. This holds for the infinite-width limit and the finite-width emiprical NTK. [1/6]

Jul 1, 2022 · 5:55 PM UTC

This result seems to be robust to various choices of hyperparameters: [2/6]
They also find evidence that we can’t think of training as two distinct regimes for the purposes of NTK scaling laws. I.e., while you might hope that the network stabilizes after a few epochs and the NTK starts working better, this doesn’t happen in practice. [3/6]
Instead, the NTK keeps scaling at a consistent rate throughout training. [4/6]
@PreetumNakkiran For more paper summaries, you might like following @mosaicml, me, or my newsletter: bit.ly/3OXJbDs As always, comments and corrections welcome! [6/6]
"Limitations of the NTK for Understanding Generalization in Deep Learning" Neural tangent kernels scale worse with dataset size than the actual neural network. This holds for the infinite-width limit and the finite-width emiprical NTK. [1/6]