Identifying graphs from noisy and incomplete data

TitleIdentifying graphs from noisy and incomplete data
Publication TypeConference Papers
Year of Publication2009
AuthorsNamata, Jr. G MS, Getoor L
Conference NameProceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data
Date Published2009///
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-60558-675-5
Keywordsclassification, data mining, entity resolution, link prediction, social networks, statistical relational learning
Abstract

There is a growing wealth of data describing networks of various types, including social networks, physical networks such as transportation or communication networks, and biological networks. At the same time, there is a growing interest in analyzing these networks, in order to uncover general laws that govern their structure and evolution, and patterns and predictive models to develop better policies and practices. However, a fundamental challenge in dealing with this newly available observational data describing networks is that the data is often of dubious quality -- it is noisy and incomplete -- and before any analysis method can be applied, the data must be cleaned, and missing information inferred. In this paper, we introduce the notion of graph identification, which explicitly models the inference of a "cleaned" output network from a noisy input graph. It is this output network that is appropriate for further analysis. We present an illustrative example and use the example to explore the types of inferences involved in graph identification, as well as the challenges and issues involved in combining those inferences. We then present a simple, general approach to combining the inferences in graph identification and experimentally show the utility of our combined approach and how the performance of graph identification is sensitive to the inter-dependencies among these inferences.

URLhttp://doi.acm.org/10.1145/1610555.1610559
DOI10.1145/1610555.1610559