Use of OCR for Rapid Construction of Bilingual Lexicons
Title | Use of OCR for Rapid Construction of Bilingual Lexicons |
Publication Type | Reports |
Year of Publication | 2003 |
Authors | Karagol-Ayan B, Doermann D, Dorr BJ |
Date Published | 2003/07// |
Institution | University of Maryland, College Park |
Abstract | This paper describes an approach to analyzing the lexical structure of OCRed bilingual dictionaries to construct resources suited for machine translation of low-density languages, where online resources are limited. A rule-based and an HMM-based method are used for rapid construction of MT lexicons based on systematic structural clues provided in the original dictionary. We evaluate the effectiveness of our techniques, concluding that: (1) the rule-based method performs better on dictionaries with a simple structure; (2) the stochastic method performs better on dictionaries with an enriched structure; (3) regardless of the degree of dictionary richness, the rule-based method gives better results for phrasal entries than for single-word entries; and (4) Our resulting bilingual lexicons are comprehensive enough to provide reasonable MT results when compared to human-constructed lexicons. |