Selecting hierarchical clustering cut points for web person-name disambiguation
Title | Selecting hierarchical clustering cut points for web person-name disambiguation |
Publication Type | Conference Papers |
Year of Publication | 2009 |
Authors | Gong J, Oard D |
Conference Name | Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval |
Date Published | 2009/// |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-60558-483-6 |
Keywords | clustering, person-name disambiguation |
Abstract | Hierarchical clustering is often used to cluster person-names referring to the same entities. Since the correct number of clusters for a given person-name is not known a priori, some way of deciding where to cut the resulting dendrogram to balance risks of over- or under-clustering is needed. This paper reports on experiments in which outcome-specific and result-set measures are used to learn a global similarity threshold. Results on the Web People Search (WePS)-2 task indicate that approximately 85% of the optimal F1 measure can be achieved on held-out data. |
URL | http://doi.acm.org/10.1145/1571941.1572124 |
DOI | 10.1145/1571941.1572124 |