D-Dupe: An Interactive Tool for Entity Resolution in Social Networks
Title | D-Dupe: An Interactive Tool for Entity Resolution in Social Networks |
Publication Type | Conference Papers |
Year of Publication | 2006 |
Authors | Bilgic M, Licamele L, Getoor L, Shneiderman B |
Conference Name | Visual Analytics Science And Technology, 2006 IEEE Symposium On |
Date Published | 2006/11/31/2 |
Keywords | algorithm;data, analysis;social, computing;user, D-Dupe, interactive, interface;data, interfaces;, mining, mining;interactive, network, problem;entity, QUALITY, resolution;entity-resolution;social, sciences, systems;social, tool;data, visual, visualization;task-specific |
Abstract | Visualizing and analyzing social networks is a challenging problem that has been receiving growing attention. An important first step, before analysis can begin, is ensuring that the data is accurate. A common data quality problem is that the data may inadvertently contain several distinct references to the same underlying entity; the process of reconciling these references is called entity-resolution. D-Dupe is an interactive tool that combines data mining algorithms for entity resolution with a task-specific network visualization. Users cope with complexity of cleaning large networks by focusing on a small subnetwork containing a potential duplicate pair. The subnetwork highlights relationships in the social network, making the common relationships easy to visually identify. D-Dupe users resolve ambiguities either by merging nodes or by marking them distinct. The entity resolution process is iterative: as pairs of nodes are resolved, additional duplicates may be revealed; therefore, resolution decisions are often chained together. We give examples of how users can flexibly apply sequences of actions to produce a high quality entity resolution result. We illustrate and evaluate the benefits of D-Dupe on three bibliographic collections. Two of the datasets had already been cleaned, and therefore should not have contained duplicates; despite this fact, many duplicates were rapidly identified using D-Dupe's unique combination of entity resolution algorithms within a task-specific visual interface |
DOI | 10.1109/VAST.2006.261429 |