Identifying graphs from noisy and incomplete data
Title | Identifying graphs from noisy and incomplete data |
Publication Type | Conference Papers |
Year of Publication | 2009 |
Authors | Namata, Jr. G MS, Getoor L |
Conference Name | Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data |
Date Published | 2009/// |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-60558-675-5 |
Keywords | classification, data mining, entity resolution, link prediction, social networks, statistical relational learning |
Abstract | There is a growing wealth of data describing networks of various types, including social networks, physical networks such as transportation or communication networks, and biological networks. At the same time, there is a growing interest in analyzing these networks, in order to uncover general laws that govern their structure and evolution, and patterns and predictive models to develop better policies and practices. However, a fundamental challenge in dealing with this newly available observational data describing networks is that the data is often of dubious quality -- it is noisy and incomplete -- and before any analysis method can be applied, the data must be cleaned, and missing information inferred. In this paper, we introduce the notion of graph identification, which explicitly models the inference of a "cleaned" output network from a noisy input graph. It is this output network that is appropriate for further analysis. We present an illustrative example and use the example to explore the types of inferences involved in graph identification, as well as the challenges and issues involved in combining those inferences. We then present a simple, general approach to combining the inferences in graph identification and experimentally show the utility of our combined approach and how the performance of graph identification is sensitive to the inter-dependencies among these inferences. |
URL | http://doi.acm.org/10.1145/1610555.1610559 |
DOI | 10.1145/1610555.1610559 |