“Reflections After Refereeing Papers for NIPS”, 1995 (; backlinks):
The theoretical work by Norbert Weiner and others on the spectral analysis of stationary time series penetrated statistics following Tukey’s heuristic work on estimation of the spectrum. In refereeing papers for NIPS the author was struck by the growing emphasis on mathematical theory.
Mathematical theory is not critical to the development of machine learning. In machine learning, the current panacea is a sigmoid network fitted using backpropagation. The pi-method, for approximating functions using noisy data, was suggested by results in mathematical approximation theory. In spite of intense activity, none of the work has had any effect on the day-to-day practice of statistics, or even on present-day theory. The useful theories was not meant to be inclusive, but even a more inclusive list would be very short. A possible reason is that it is difficult to formulate reasonable analytic models for complex data.
…Uses Of Theory
- Comfort: We knew it worked, but it’s nice to have a proof.
- Insight: Aha! So that’s why it works.
- Innovation: At last, a mathematically proven idea that applies to data.
- Suggestion: Something like this might work with data.
…Our fields would be better off with far fewer theorems, less emphasis on faddish stuff, and much more scientific inquiry and engineering. But the latter requires real thinking. For instance, there are many important questions regarding neural networks which are largely unanswered. There seem to be conflicting stories regarding the following issues:
Why don’t heavily parameterized neural networks overfit the data?
What is the effective number of parameters?
Why doesn’t backpropagation head for a poor local minima?
When should one stop the backpropagation and use the current parameters?
It makes research more interesting to know that there is no one universally best method. What is best is data dependent. Sometimes “least glamorous” methods such as nearest neighbor are best. We need to learn more about what works best where. But emphasis on theory often distracts us from doing good engineering and living with the data.