Improved Word-Level Alignment: Injecting Knowledge about MT Divergences
Title | Improved Word-Level Alignment: Injecting Knowledge about MT Divergences |
Publication Type | Reports |
Year of Publication | 2002 |
Authors | Dorr BJ, Pearl L, Hwa R, Habash N |
Date Published | 2002/02/14/ |
Institution | Instititue for Advanced Computer Studies, Univ of Maryland, College Park |
Keywords | *LEXICOGRAPHY, *MACHINE TRANSLATION, *STATISTICAL ANALYSIS, *WORDS(LANGUAGE), ACQUISITION, ALIGNMENT, EXPERIMENTAL DATA, LANGUAGE, linguistics, MATHEMATICAL MODELS, STATISTICS AND PROBABILITY, TREES |
Abstract | Word-level alignments of bilingual text (bitexts) are not an integral part of statistical machine translation models, but also useful for lexical acquisition, treebank construction. and part-of-speech tagging. The frequent occurrence of divergences, structural differences between languages, presents a great challenge to the alignment task. We resolve some of the most prevalent divergence cases by using syntactic parse information to transform the sentence structure of one language to bear a closer resemblance to that of the other language. In this paper, we show that common divergence types can be found in multiple language pairs (in particular, we focus on English-Spanish and English-Arabic) and systematically identified. We describe our techniques for modifying English parse trees to form resulting sentences that share more similarity with the sentences in the other languages; finally, we present an empirical analysis comparing the complexities of performing word-level alignments with an without divergence handling. Our results suggest that divergence-handling can improve word-level alignment. |
URL | http://stinet.dtic.mil/oai/oai?&verb=getRecord&metadataPrefix=html&identifier=ADA458774 |