Geotagging: using proximity, sibling, and prominence clues to understand comma groups
Title | Geotagging: using proximity, sibling, and prominence clues to understand comma groups |
Publication Type | Conference Papers |
Year of Publication | 2010 |
Authors | Lieberman MD, Samet H, Sankaranayananan J |
Conference Name | Proceedings of the 6th Workshop on Geographic Information Retrieval |
Date Published | 2010/// |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-60558-826-1 |
Keywords | comma groups, geotagging, toponyms |
Abstract | Geotagging is the process of recognizing textual references to geographic locations, known as toponyms, and resolving these references by assigning each lat/long values. Typical geotagging algorithms use a variety of heuristic evidence to select the correct interpretation for each toponym. A study is presented of one such heuristic which aids in recognizing and resolving lists of toponyms, referred to as comma groups. Comma groups of toponyms are recognized and resolved by inferring the common threads that bind them together, based on the toponyms' shared geographic attributes. Three such common threads are proposed and studied --- population-based prominence, distance-based proximity, and sibling relationships in a geographic hierarchy --- and examples of each are noted. In addition, measurements are made of these comma groups' usage and variety in a large dataset of news articles, indicating that the proposed heuristics, and in particular the proximity and sibling heuristics, are useful for resolving comma group toponyms. |
URL | http://doi.acm.org/10.1145/1722080.1722088 |
DOI | 10.1145/1722080.1722088 |