A topic-based Document Correlation Model
Title | A topic-based Document Correlation Model |
Publication Type | Conference Papers |
Year of Publication | 2008 |
Authors | Jia X-P, Peng H, Zheng Q-L, Zhuolin Jiang, Li Z |
Conference Name | Machine Learning and Cybernetics, 2008 International Conference on |
Date Published | 2008/07// |
Keywords | bipartite graph optimal matching, data mining, document correlation analysis, document retrieval, Gibbs sampling, Information retrieval, latent Dirichlet allocation model, text analysis, text mining, topic-based document correlation model |
Abstract | Document correlation analysis is now a focus of study in text mining. This paper proposed a Document Correlation Model to capture the correlation between documents from topic level. The model represents the document correlation as the Optimal Matching of a bipartite graph, of which each partition is a document, each node is a topic, and each edge is the similarity between two topics. The topics of each document are retrieved by the Latent Dirichlet Allocation model and Gibbs sampling. Experiments on correlated document search show that the Document Correlation Model outperforms the Vector Space Model on two aspects: 1) it has higher average retrieval precision; 2) it needs less space to store a documentpsilas information. |
DOI | 10.1109/ICMLC.2008.4620826 |