“Surprises in High-Dimensional Ridgeless Least Squares Interpolation”, Trevor Hastie, Andrea Montanari, Saharon Rosset, Ryan J. Tibshirani2019-03-19 (, ; similar)⁠:

Interpolators—estimators that achieve zero training error—have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type.

In this paper, we study minimum 𝓁2 norm (“ridgeless”) interpolation in high-dimensional least squares regression. We consider two different models for the feature distribution: a linear model, where the feature vectors xi ∈ ℝp are obtained by applying a linear transform to a vector of i.i.d. entries, xi = ∑1⁄2zi (with zi ∈ ℝp); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, xi = ϕ(Wzi) (with zi ∈ ℝd, W ∈ ℝp×d a matrix of i.i.d. entries, and ϕ an activation function acting component-wise on Wzi).

We recover—in a precise quantitative way—several phenomena that have been observed in large-scale neural networks and kernel machines, including the “double descent” behavior of the prediction risk, and the potential benefits of overparametrization.