Multilingual topic models for unaligned text
Title | Multilingual topic models for unaligned text |
Publication Type | Conference Papers |
Year of Publication | 2009 |
Authors | Boyd-Graber J, Blei DM |
Conference Name | Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence |
Date Published | 2009/// |
Publisher | AUAI Press |
Conference Location | Arlington, Virginia, United States |
ISBN Number | 978-0-9749039-5-8 |
Abstract | We develop the multilingual topic model for unaligned text (MuTo), a probabilistic model of text that is designed to analyze corpora composed of documents in two languages. From these documents, MuTo uses stochastic EM to simultaneously discover both a matching between the languages and multilingual latent topics. We demonstrate that MuTo is able to find shared topics on real-world multilingual corpora, successfully pairing related documents across languages. MuTo provides a new framework for creating multilingual topic models without needing carefully curated parallel corpora and allows applications built using the topic model formalism to be applied to a much wider class of corpora. |
URL | http://dl.acm.org/citation.cfm?id=1795114.1795124 |