Effects of term segmentation on Chinese/English cross-language information retrieval
Title | Effects of term segmentation on Chinese/English cross-language information retrieval |
Publication Type | Conference Papers |
Year of Publication | 1999 |
Authors | Oard D, Wang J |
Conference Name | String Processing and Information Retrieval Symposium, 1999 and International Workshop on Groupware |
Date Published | 1999/// |
Publisher | IEEE |
ISBN Number | 0-7695-0268-7 |
Keywords | alternative term weighting strategies, cascading effect, Chinese segmentation techniques, Chinese/English cross-language information retrieval, Chromium, CLIR problems, Cross-Language Information Retrieval research, data mining, Dictionaries, dictionary-based Chinese, East Asian languages, English query translation, European languages, future work, Information retrieval, language translation, linguistics, natural language processing, natural languages, productive directions, Reactive power, retrieval effectiveness, task-tuned segmentation algorithms, technical terms, term segmentation, text analysis, written Chinese texts |
Abstract | The majority of recent Cross-Language Information Retrieval (CLIR) research has focused on European languages. CLIR problems that involve East Asian languages such as Chinese introduce additional challenges, because written Chinese texts lack boundaries between terms. The paper examines three Chinese segmentation techniques in combination with two variants of dictionary-based Chinese to English query translation. The results indicate that failure to segment terms, particularly technical terms and names, can have a cascading effect that reduces retrieval effectiveness. Task-tuned segmentation algorithms and alternative term weighting strategies are suggested as productive directions for future work |
DOI | 10.1109/SPIRE.1999.796590 |