The Detection of Duplicates in Document Image Databases

TitleThe Detection of Duplicates in Document Image Databases
Publication TypeConference Papers
Year of Publication1997
AuthorsDoermann D, Li H, Kia O
Conference NameICDAR
Date Published1997///
Abstract

The problem of off-line handwritten character recognition has eluded a satisfactory solution for several decades. Researchers working in the area of on-line recognition have had greater success, but the possibility of extracting on-line information from static images has not been fully explored. The experience of forensic document examiners assures us that in many cases, such information can be successfully recovered.We outline the design of a system for the recovery of temporal information from static handwritten images. We provide a taxonomy of local, regional and global temporal clues which are often found in hand-written samples, and describe methods for recovering these clues from the image.
We show how this system can benefit from obtaining a comprehensive understanding of the handwriting signal and a detailed analysis of stroke and sub-stroke properties. We suggest that the recovery task requires that we break away from traditional thresholding and thinning techniques, and we provide a framework for such analysis. We demonstrate how isolated temporal clues can reliably be extracted from this framework and propose a control structure for integrating the partial information.
We show how many seemingly ambiguous situations can be resolved by the derived clues and our knowledge of the writing process, and provide several examples to illustrate our approach.
The support of this research by the Ricoh Corporation is gratefully acknowledged.