“Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis”, Patrice Y. Simard, Dave Steinkraus, John C. Platt2003 ()⁠:

Neural networks are a powerful technology for classification of visual inputs arising from documents. However, there is a confusing plethora of different neural network methods that are used in the literature and in industry.

This paper describes a set of concrete best practices that document analysis researchers can use to get good results with neural networks.

The most important practice is getting a training set as large as possible: we expand the training set by adding a new form of distorted data.

The next most important practice is that convolutional neural networks are better suited for visual document tasks than fully connected networks. We propose that a simple “do-it-yourself” implementation of convolution with a flexible architecture is suitable for many visual document problems. This simple convolutional neural network does not require complex methods, such as momentum, weight decay, structure-dependent learning rates, averaging layers, tangent prop, or even finely-tuning the architecture.

The end result is a very simple yet general architecture which can yield state-of-the-art performance for document analysis.

We illustrate our claims on the MNIST set of English digit images.