“Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis”, 2003 ():
Neural networks are a powerful technology for classification of visual inputs arising from documents. However, there is a confusing plethora of different neural network methods that are used in the literature and in industry.
This paper describes a set of concrete best practices that document analysis researchers can use to get good results with neural networks.
The most important practice is getting a training set as large as possible: we expand the training set by adding a new form of distorted data.
The next most important practice is that convolutional neural networks are better suited for visual document tasks than fully connected networks. We propose that a simple “do-it-yourself” implementation of convolution with a flexible architecture is suitable for many visual document problems. This simple convolutional neural network does not require complex methods, such as momentum, weight decay, structure-dependent learning rates, averaging layers, tangent prop, or even finely-tuning the architecture.
The end result is a very simple yet general architecture which can yield state-of-the-art performance for document analysis.
We illustrate our claims on the MNIST set of English digit images.