Divergence Unraveling for Word Alignment...

TitleDivergence Unraveling for Word Alignment...
Publication TypeConference Papers
Year of Publication2004
AuthorsDorr BJ, Ayan NF, Habash N
Conference NameNatural Language Engineering
Date Published2004///
Abstract

We describe the use of parallel text for divergence unraveling in word-level alignment. DUSTer (Divergence Unraveling for Statistical Translation) is a system that combines linguistic and statistical knowledge to resolve structural di#erences between languages, i.e., translation divergences, during the process of alignment. Our immediate goal is to induce word-level alignments that are more accurate than those produced by an existing state-ofthe -art statistical system. The long-term goal is to improve the output quality of statistical machine translation and lexical acquisition systems by using DUSTer as one possible input to a framework that accommodates multiple alignments. We show that a systematic characterization of alignment errors made by a statistical system validates the use of linguistically-motivated universal rules for identifying and handling divergences. These rules relate one or more linguistically-motivated categories associated with the (English) input words to those of another language (foreign language); the resulting match sets are used to infer corrected alignments. Using a human-aligned corpus as our gold standard, we demonstrate an improvement in alignments over an existing state-of-the-art alignment algorithm.