{"title":"A Spatially-Aware Data-Driven Approach to Automatically Geocoding Non-Gazetteer Place Names","authors":"Praval Sharma, Ashok Samal, Leen-Kiat Soh, Deepti Joshi","doi":"10.1145/3627987","DOIUrl":null,"url":null,"abstract":"Human and natural processes such as navigation and natural calamities are intrinsically linked to the geographic space and described using place names. Extraction and subsequent geocoding of place names from text are critical for understanding the onset, progression, and end of these processes. Geocoding place names extracted from text requires using an external knowledge base such as a gazetteer. However, a standard gazetteer is typically incomplete. Additionally, widely used place name geocoding—also known as toponym resolution—approaches generally focus on geocoding ambiguous but known gazetteer place names. Hence there is a need for an approach to automatically geocode non -gazetteer place names. In this research, we demonstrate that patterns in place names are not spatially random. Places are often named based on people, geography, and history of the area and thus exhibit a degree of similarity. Similarly, places that co-occur in text are likely to be spatially proximate as they provide geographic reference to common events. We propose a novel data-driven spatially-aware algorithm, Bhugol , that leverages the spatial patterns and the spatial context of place names to automatically geocode the non-gazetteer place names. The efficacy of Bhugol is demonstrated using two diverse geographic areas – USA and India. The results show that Bhugol outperforms well-known state-of-the-art geocoders.","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.2000,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Spatial Algorithms and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3627987","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"REMOTE SENSING","Score":null,"Total":0}
引用次数: 0
Abstract
Human and natural processes such as navigation and natural calamities are intrinsically linked to the geographic space and described using place names. Extraction and subsequent geocoding of place names from text are critical for understanding the onset, progression, and end of these processes. Geocoding place names extracted from text requires using an external knowledge base such as a gazetteer. However, a standard gazetteer is typically incomplete. Additionally, widely used place name geocoding—also known as toponym resolution—approaches generally focus on geocoding ambiguous but known gazetteer place names. Hence there is a need for an approach to automatically geocode non -gazetteer place names. In this research, we demonstrate that patterns in place names are not spatially random. Places are often named based on people, geography, and history of the area and thus exhibit a degree of similarity. Similarly, places that co-occur in text are likely to be spatially proximate as they provide geographic reference to common events. We propose a novel data-driven spatially-aware algorithm, Bhugol , that leverages the spatial patterns and the spatial context of place names to automatically geocode the non-gazetteer place names. The efficacy of Bhugol is demonstrated using two diverse geographic areas – USA and India. The results show that Bhugol outperforms well-known state-of-the-art geocoders.
期刊介绍:
ACM Transactions on Spatial Algorithms and Systems (TSAS) is a scholarly journal that publishes the highest quality papers on all aspects of spatial algorithms and systems and closely related disciplines. It has a multi-disciplinary perspective in that it spans a large number of areas where spatial data is manipulated or visualized (regardless of how it is specified - i.e., geometrically or textually) such as geography, geographic information systems (GIS), geospatial and spatiotemporal databases, spatial and metric indexing, location-based services, web-based spatial applications, geographic information retrieval (GIR), spatial reasoning and mining, security and privacy, as well as the related visual computing areas of computer graphics, computer vision, geometric modeling, and visualization where the spatial, geospatial, and spatiotemporal data is central.