Online Latent Dirichlet Allocation with Infinite Vocabulary
Title | Online Latent Dirichlet Allocation with Infinite Vocabulary |
Publication Type | Conference Papers |
Year of Publication | 2013 |
Authors | Zhai K, Boyd-Graber J |
Conference Name | International Conference on Machine Learning |
Abstract | Topic models based on latent Dirichlet allocation (LDA) assume a predefined vocabulary. This is reasonable in batch settings but not reasonable for streaming and online settings. To address this lacuna, we extend LDA by drawing topics from a Dirichlet process whose base distribution is a distribution over all strings rather than from a finite Dirichlet. We develop inference using online variational inference and—to only consider a finite number of words for each topic—propose heuristics to dynamically order, expand, and contract the set of words we consider in our vocabulary. We show our model can successfully incorporate new words and that it performs better than topic models with finite vocabularies in evaluations of topic quality and classification |