Searching documentation using text, OCR, and image

TitleSearching documentation using text, OCR, and image
Publication TypeConference Papers
Year of Publication2009
AuthorsTom Yeh, Katz B
Conference NameProceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Date Published2009///
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-60558-483-6
KeywordsComputer vision, content-based image retrieval, multimodal search
Abstract

We describe a mixed-modality method to index and search software documentation in three ways: plain text, OCR text of embedded figures, and visual features of these figures. Using a corpus of 102 computer books with a total of 62,943 pages and 75,800 figures, we empirically demonstrate that our method achieves better precision/recall than do alternatives based on single modalities.

URLhttp://doi.acm.org/10.1145/1571941.1572123
DOI10.1145/1571941.1572123