Selecting hierarchical clustering cut points for web person-name disambiguation

TitleSelecting hierarchical clustering cut points for web person-name disambiguation
Publication TypeConference Papers
Year of Publication2009
AuthorsGong J, Oard D
Conference NameProceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Date Published2009///
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-60558-483-6
Keywordsclustering, person-name disambiguation
Abstract

Hierarchical clustering is often used to cluster person-names referring to the same entities. Since the correct number of clusters for a given person-name is not known a priori, some way of deciding where to cut the resulting dendrogram to balance risks of over- or under-clustering is needed. This paper reports on experiments in which outcome-specific and result-set measures are used to learn a global similarity threshold. Results on the Web People Search (WePS)-2 task indicate that approximately 85% of the optimal F1 measure can be achieved on held-out data.

URLhttp://doi.acm.org/10.1145/1571941.1572124
DOI10.1145/1571941.1572124