"Limitations of the NTK for Understanding Generalization in Deep Learning"
Neural tangent kernels scale worse with dataset size than the actual neural network. This holds for the infinite-width limit and the finite-width emiprical NTK. [1/6]
Jul 1, 2022 · 5:55 PM UTC
They also find evidence that we can’t think of training as two distinct regimes for the purposes of NTK scaling laws. I.e., while you might hope that the network stabilizes after a few epochs and the NTK starts working better, this doesn’t happen in practice. [3/6]
Paper: arxiv.org/abs/2206.10012
If you like this paper, consider RTing this (or another!) thread to publicize the authors' work, or following the authors: @Nikhilkvyas @whybansal… [5/6]
…@PreetumNakkiran
For more paper summaries, you might like following @mosaicml, me, or my newsletter: bit.ly/3OXJbDs
As always, comments and corrections welcome! [6/6]