Mandarin-English Information (MEI): investigating translingual speech retrieval
Title | Mandarin-English Information (MEI): investigating translingual speech retrieval |
Publication Type | Journal Articles |
Year of Publication | 2004 |
Authors | Meng HM, Chen B, Khudanpur S, Levow G-A, Lo W-K, Oard D, Schone P, Tang K, Wang H-M, Wang J |
Journal | Computer Speech & Language |
Volume | 18 |
Issue | 2 |
Pagination | 163 - 179 |
Date Published | 2004/04// |
ISBN Number | 0885-2308 |
Keywords | English–Chinese cross-language spoken document retrieval, Multi-scale spoken document retrieval |
Abstract | This paper describes the Mandarin–English Information (MEI) project, where we investigated the problem of cross-language spoken document retrieval (CL-SDR), and developed one of the first English–Chinese CL-SDR systems. Our system accepts an entire English news story (text) as query, and retrieves relevant Chinese broadcast news stories (audio) from the document collection. Hence, this is a cross-language and cross-media retrieval task. We applied a multi-scale approach to our problem, which unifies the use of phrases, words and subwords in retrieval. The English queries are translated into Chinese by means of a dictionary-based approach, where we have integrated phrase-based translation with word-by-word translation. Untranslatable named entities are transliterated by a novel subword translation technique. The multi-scale approach can be divided into three subtasks – multi-scale query formulation, multi-scale audio indexing (by speech recognition) and multi-scale retrieval. Experimental results demonstrate that the use of phrase-based translation and subword translation gave performance gains, and multi-scale retrieval outperforms word-based retrieval. |
URL | http://www.sciencedirect.com/science/article/pii/S0885230803000524 |
DOI | 10.1016/j.csl.2003.09.003 |