Mr. LDA: A Flexible Large Scale Topic Modeling Package using Variational Inference in MapReduce
Title | Mr. LDA: A Flexible Large Scale Topic Modeling Package using Variational Inference in MapReduce |
Publication Type | Conference Papers |
Year of Publication | 2012 |
Authors | Zhai K, Boyd-Graber J, Asadi N, Alkhouja M |
Conference Name | Proceedings of ACM International Conference on World Wide Web, 2012 |
Date Published | 2012/// |
Abstract | Latent Dirichlet Allocation (LDA) is a popular topic modeling tech- nique for exploring document collections. Because of the increasing prevalence of large datasets, there is a need to improve the scal- ability of inference for LDA. In this paper, we introduce a novel and flexible large scale topic modeling package in MapReduce (Mr. LDA). As opposed to other techniques which use Gibbs sampling, our proposed framework uses variational inference, which easily fits into a distributed environment. More importantly, this variational implementation, unlike highly tuned and specialized implementa- tions based on Gibbs sampling, is easily extensible. We demonstrate two extensions of the models possible with this scalable framework: informed priors to guide topic discovery and extracting topics from a multilingual corpus. We compare the scalability of Mr. LDA against Mahout, an existing large scale topic modeling package. Mr. LDA out-performs Mahout both in execution speed and held-out likelihood. |