A categorial variation database for English
Title | A categorial variation database for English |
Publication Type | Conference Papers |
Year of Publication | 2003 |
Authors | Habash N, Dorr BJ |
Conference Name | Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1 |
Date Published | 2003/// |
Publisher | Association for Computational Linguistics |
Conference Location | Stroudsburg, PA, USA |
Abstract | We describe our approach to the construction and evaluation of a large-scale database called "CatVar" which contains categorial variations of English lexemes. Due to the prevalence of cross-language categorial variation in multilingual applications, our categorial-variation resource may serve as an integral part of a diverse range of natural language applications. Thus, the research reported herein overlaps heavily with that of the machine-translation, lexicon-construction, and information-retrieval communities.We apply the information-retrieval metrics of precision and recall to evaluate the accuracy and coverage of our database with respect to a human-produced gold standard. This evaluation reveals that the categorial database achieves a high degree of precision and recall. Additionally, we demonstrate that the database improves on the linkability of Porter stemmer by over 30%. |
URL | http://dx.doi.org/10.3115/1073445.1073458 |
DOI | 10.3115/1073445.1073458 |