Davis Blalock · Jul 1, 2022 · 5:55 PM UTC

Davis Blalock · Jul 1, 2022 · 5:55 PM UTC

Davis Blalock

1 Jul 2022

"Limitations of the NTK for Understanding Generalization in Deep Learning" Neural tangent kernels scale worse with dataset size than the actual neural network. This holds for the infinite-width limit and the finite-width emiprical NTK. [1/6]

Jul 1, 2022 · 5:55 PM UTC

Davis Blalock · Jul 1, 2022 · 5:55 PM UTC

Davis Blalock

@davisblalock

1 Jul 2022

This result seems to be robust to various choices of hyperparameters: [2/6]

Davis Blalock · Jul 1, 2022 · 5:55 PM UTC

Davis Blalock

@davisblalock

1 Jul 2022

They also find evidence that we can’t think of training as two distinct regimes for the purposes of NTK scaling laws. I.e., while you might hope that the network stabilizes after a few epochs and the NTK starts working better, this doesn’t happen in practice. [3/6]

Davis Blalock · Jul 1, 2022 · 5:55 PM UTC

Davis Blalock

@davisblalock

1 Jul 2022

Instead, the NTK keeps scaling at a consistent rate throughout training. [4/6]

Davis Blalock · Jul 1, 2022 · 5:55 PM UTC

Davis Blalock

@davisblalock

1 Jul 2022

Paper: arxiv.org/abs/2206.10012 If you like this paper, consider RTing this (or another!) thread to publicize the authors' work, or following the authors: @Nikhilkvyas @whybansal… [5/6]

Limitations of the NTK for Understanding Generalization in Deep Learning

The ``Neural Tangent Kernel'' (NTK) (Jacot et al 2018), and its empirical variants have been proposed as a proxy to capture certain behaviors of real neural networks. In this work, we study NTKs...

arxiv.org

Davis Blalock · Jul 1, 2022 · 5:55 PM UTC

Davis Blalock

@davisblalock

1 Jul 2022

…@PreetumNakkiran For more paper summaries, you might like following @mosaicml, me, or my newsletter: bit.ly/3OXJbDs As always, comments and corrections welcome! [6/6]

Davis Blalock

@davisblalock

1 Jul 2022