Breaking the resource bottleneck for multilingual parsing
Title | Breaking the resource bottleneck for multilingual parsing |
Publication Type | Reports |
Year of Publication | 2005 |
Authors | Hwa R, Resnik P, Weinberg A |
Date Published | 2005/// |
Institution | Instititue for Advanced Computer Studies, Univ of Maryland, College Park |
Abstract | We propose a framework that enables the acquisition of annotation-heavy resources such as syntacfic dependency tree corpora for low-resource languages by importing linguistic annotations from high-quality English resources. We present a large-scale experiment showing that Chinese dependency trees can be induced by using an English parser, a word alignment package, and a large corpus of sentence-aligned bilingual text. As a part of the experiment, we evaluate the quality of a Chinese parser trained on the induced dependency treebank. We find that a parser trained in this manner out-performs some simple baselines inspite of the noise in the induced treebank. The results suggest that projecting syntactic structures from English is a viable option for acquiring annotated syntactic structures quickly and cheaply. We expect the quality of the induced treebank to improve when more sophisticated filtering and error-correction techniques are applied. |
URL | http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix=html&identifier=ADA440432 |