D-Dupe: An Interactive Tool for Entity Resolution in Social Networks

TitleD-Dupe: An Interactive Tool for Entity Resolution in Social Networks
Publication TypeConference Papers
Year of Publication2006
AuthorsBilgic M, Licamele L, Getoor L, Shneiderman B
Conference NameVisual Analytics Science And Technology, 2006 IEEE Symposium On
Date Published2006/11/31/2
Keywordsalgorithm;data, analysis;social, computing;user, D-Dupe, interactive, interface;data, interfaces;, mining, mining;interactive, network, problem;entity, QUALITY, resolution;entity-resolution;social, sciences, systems;social, tool;data, visual, visualization;task-specific
Abstract

Visualizing and analyzing social networks is a challenging problem that has been receiving growing attention. An important first step, before analysis can begin, is ensuring that the data is accurate. A common data quality problem is that the data may inadvertently contain several distinct references to the same underlying entity; the process of reconciling these references is called entity-resolution. D-Dupe is an interactive tool that combines data mining algorithms for entity resolution with a task-specific network visualization. Users cope with complexity of cleaning large networks by focusing on a small subnetwork containing a potential duplicate pair. The subnetwork highlights relationships in the social network, making the common relationships easy to visually identify. D-Dupe users resolve ambiguities either by merging nodes or by marking them distinct. The entity resolution process is iterative: as pairs of nodes are resolved, additional duplicates may be revealed; therefore, resolution decisions are often chained together. We give examples of how users can flexibly apply sequences of actions to produce a high quality entity resolution result. We illustrate and evaluate the benefits of D-Dupe on three bibliographic collections. Two of the datasets had already been cleaned, and therefore should not have contained duplicates; despite this fact, many duplicates were rapidly identified using D-Dupe's unique combination of entity resolution algorithms within a task-specific visual interface

DOI10.1109/VAST.2006.261429