“Extraction De Séquences Numériques Dans Des Documents Manuscrits Quelconques”, Clément Chatelain2006-12-05 (; backlinks)⁠:

Within the framework of the automatic processing of incoming mail documents, we present in this thesis the conception and development of a numerical field extraction system in weakly constrained handwritten documents.

Although the recognition of isolated handwritten entities can be considered as a partially solved problem, the extraction of information in images of complex and free-layout documents is still a challenge. This problem requires the implementation of both handwriting recognition and information extraction methods inspired by approaches developed within the field of information extraction in electronic documents.

Our contribution consists in the conception and the implementation of 2 different strategies: the first extends classical handwriting recognition methods, while the second is inspired from approaches used within the field of information extraction in electronic documents.

The results obtained on a real handwritten mail database show that our second approach is substantially better.

Finally, a complete, generic and efficient system is produced, answering one of the emergent perspectives in the field of the automatic reading of handwritten documents: the extraction of complex information in images of documents. [Text of paper is in French.]