Carmen Brando, Catherine Dominguès, Magali Capeyron
Ongoing initiatives promoted by cultural institutions and public administrations engage in the development of textual corpora issued from the general public. In this work, we deal with a spoken corpus of life stories and a crowd-sourced Web corpus of people's contributions related to urban planning issues in their city. Located information constitutes an essential component in these corpora. Toponyms refer to official names (e.g. Congo) which are listed in gazetteers but often to generic locations such as un endroit très beau (a beautiful place). Because of the nature of the corpora, these generic locations are inherently subjective, vague and descriptive. For enabling automated exploitation of these texts, it is crucial to properly detect such kinds of place mentions. In this sense, the present work provides a comparative study of state-of-art NER1 systems, most importantly of supervised tools such as Stanford NER, for the identification of generic locations in thematic corpora.
文化机构和公共行政部门正在推动的举措是开发公众发布的文本语料库。在这项工作中,我们处理了生活故事的口语语料库和人们对其城市规划问题的贡献的众包网络语料库。定位信息是这些语料库的重要组成部分。地名指的是在地名词典中列出的官方名称(如刚果),但通常指的是一般的地点,如unendroit tr s beau(一个美丽的地方)。由于语料库的性质,这些通用位置本质上是主观的、模糊的和描述性的。为了实现对这些文本的自动利用,正确检测这类地点提及是至关重要的。从这个意义上说,本研究提供了对最先进的NER1系统的比较研究,最重要的是斯坦福NER等监督工具,用于识别主题语料库中的通用位置。
{"title":"Evaluation of NER systems for the recognition of place mentions in French thematic corpora","authors":"Carmen Brando, Catherine Dominguès, Magali Capeyron","doi":"10.1145/3003464.3003471","DOIUrl":"https://doi.org/10.1145/3003464.3003471","url":null,"abstract":"Ongoing initiatives promoted by cultural institutions and public administrations engage in the development of textual corpora issued from the general public. In this work, we deal with a spoken corpus of life stories and a crowd-sourced Web corpus of people's contributions related to urban planning issues in their city. Located information constitutes an essential component in these corpora. Toponyms refer to official names (e.g. Congo) which are listed in gazetteers but often to generic locations such as un endroit très beau (a beautiful place). Because of the nature of the corpora, these generic locations are inherently subjective, vague and descriptive. For enabling automated exploitation of these texts, it is crucial to properly detect such kinds of place mentions. In this sense, the present work provides a comparative study of state-of-art NER1 systems, most importantly of supervised tools such as Stanford NER, for the identification of generic locations in thematic corpora.","PeriodicalId":308638,"journal":{"name":"Proceedings of the 10th Workshop on Geographic Information Retrieval","volume":"21 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132267279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Local news articles are an important source of knowledge about local events, place-specific culture, and peoples' thoughts about their environment. Reliable geocoding of such articles is the first step towards unlocking such local knowledge for community engagement and development. However, existing geo-referencing methods and tools do not work well for local news because they do not reflect the ways local people encode and communicate geographical knowledge. This paper argues that local news requires a different method and infrastructure support for effective geo-referencing. To gain insights on the unique aspects of local gazetteers and the nature of ambiguities, we present an analysis of a collection of local new articles. We found that place references in local news have their special vocabulary, and that their ambiguities are handled differently by local people. We translated such insights into a gazetteer-based geocoding solution that combines progressive geocoding with a smart footprint recommender. Progressive geocoding service uses Nominatim (OpenStreetMap) as the initial gazetteer to jump-start the construction of local gazetteer for a community and by the community. LocusRecommender automatically suggests the best matches from gazetteer ranked by a set of heuristic rules. Preliminary evaluation shows that our smart footprint recommender predicts 80% of the answers by its top-three recommendations.
{"title":"Towards geo-referencing infrastructure for local news","authors":"Guoray Cai, Ye Tian","doi":"10.1145/3003464.3003473","DOIUrl":"https://doi.org/10.1145/3003464.3003473","url":null,"abstract":"Local news articles are an important source of knowledge about local events, place-specific culture, and peoples' thoughts about their environment. Reliable geocoding of such articles is the first step towards unlocking such local knowledge for community engagement and development. However, existing geo-referencing methods and tools do not work well for local news because they do not reflect the ways local people encode and communicate geographical knowledge. This paper argues that local news requires a different method and infrastructure support for effective geo-referencing. To gain insights on the unique aspects of local gazetteers and the nature of ambiguities, we present an analysis of a collection of local new articles. We found that place references in local news have their special vocabulary, and that their ambiguities are handled differently by local people. We translated such insights into a gazetteer-based geocoding solution that combines progressive geocoding with a smart footprint recommender. Progressive geocoding service uses Nominatim (OpenStreetMap) as the initial gazetteer to jump-start the construction of local gazetteer for a community and by the community. LocusRecommender automatically suggests the best matches from gazetteer ranked by a set of heuristic rules. Preliminary evaluation shows that our smart footprint recommender predicts 80% of the answers by its top-three recommendations.","PeriodicalId":308638,"journal":{"name":"Proceedings of the 10th Workshop on Geographic Information Retrieval","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126406191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we point out to the shortcomings of precision and recall in evaluating the performance of geoparsing algorithms. We propose separate processes for evaluating toponym recognition and toponym resolution stages, and also propose new metrics that quantify the performance of toponym resolution.
{"title":"Performance evaluation measures for toponym resolution","authors":"M. Karimzadeh","doi":"10.1145/3003464.3003472","DOIUrl":"https://doi.org/10.1145/3003464.3003472","url":null,"abstract":"In this paper, we point out to the shortcomings of precision and recall in evaluating the performance of geoparsing algorithms. We propose separate processes for evaluating toponym recognition and toponym resolution stages, and also propose new metrics that quantify the performance of toponym resolution.","PeriodicalId":308638,"journal":{"name":"Proceedings of the 10th Workshop on Geographic Information Retrieval","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122838122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nevena Golubovic, C. Krintz, R. Wolski, Sara Lafia, T. Hervey, W. Kuhn
Farmers face pressure to respond to unpredictable weather, the spread of pests, and other variable events on their farms. This paper proposes a framework for data aggregation from diverse sources that extracts named places impacted by events relevant to agricultural practices. Our vision is to couple natural language processing, geocoding, and existing geographic information retrieval techniques to increase the value of already-available data through aggregation, filtering, validation, and notifications, helping farmers make timely and informed decisions with greater ease.
{"title":"Extracting spatial information from social media in support of agricultural management decisions","authors":"Nevena Golubovic, C. Krintz, R. Wolski, Sara Lafia, T. Hervey, W. Kuhn","doi":"10.1145/3003464.3003468","DOIUrl":"https://doi.org/10.1145/3003464.3003468","url":null,"abstract":"Farmers face pressure to respond to unpredictable weather, the spread of pests, and other variable events on their farms. This paper proposes a framework for data aggregation from diverse sources that extracts named places impacted by events relevant to agricultural practices. Our vision is to couple natural language processing, geocoding, and existing geographic information retrieval techniques to increase the value of already-available data through aggregation, filtering, validation, and notifications, helping farmers make timely and informed decisions with greater ease.","PeriodicalId":308638,"journal":{"name":"Proceedings of the 10th Workshop on Geographic Information Retrieval","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126817152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a categorization algorithm for text content description such as tags for images from social media or crowd sourcing services, to identify places characteristics. The algorithm is based on a spatial coverage and a multi-facets categorization. We describe how it can be applied to individually process images from Flickr in order to extract geo-spatial knowledge. It is particularly dedicated for places with a small number of photos. The extraction process is done using categorization rules based on geographic and terminological knowledge resources.
{"title":"Semantic enrichment of places with VGI sources: a knowledge based approach","authors":"Camille Tardy, G. Falquet, L. Moccozet","doi":"10.1145/3003464.3003470","DOIUrl":"https://doi.org/10.1145/3003464.3003470","url":null,"abstract":"We propose a categorization algorithm for text content description such as tags for images from social media or crowd sourcing services, to identify places characteristics. The algorithm is based on a spatial coverage and a multi-facets categorization. We describe how it can be applied to individually process images from Flickr in order to extract geo-spatial knowledge. It is particularly dedicated for places with a small number of photos. The extraction process is done using categorization rules based on geographic and terminological knowledge resources.","PeriodicalId":308638,"journal":{"name":"Proceedings of the 10th Workshop on Geographic Information Retrieval","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130619247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Semantic aligning of heterogeneous geographical data from different sources behaves unsatisfactory on Geographical Semantic Web (GSW) due to the flat structure of GSW and the influence of spatial features. To solve this problem, this paper proposes a holistic framework for GSW aligning. This holistic framework firstly produces the initial matched results respectively for classes, properties and instances by the approval voting strategy, and then enhances these results by the mutual cooperating mechanism. Especially, spatial distance and spatial index are introduced to align instances and to improve the performance of aligning class and aligning property. To demonstrate its ability, this holistic framework is tested with two real GSWs. Compared with the state-of-the-art holistic alignment system, namely PARIS, this framework gains a large number of matched pairs. The Fl values of aligning class, aligning property and aligning instance respectively are 0.562, 0.545 and 0.646, all of which are higher than PARIS's.
{"title":"A holistic framework of geographical semantic web aligning","authors":"Li Yu, Xiliang Liu, Mingxiao Li, Peng Peng, F. Lu","doi":"10.1145/3003464.3003465","DOIUrl":"https://doi.org/10.1145/3003464.3003465","url":null,"abstract":"Semantic aligning of heterogeneous geographical data from different sources behaves unsatisfactory on Geographical Semantic Web (GSW) due to the flat structure of GSW and the influence of spatial features. To solve this problem, this paper proposes a holistic framework for GSW aligning. This holistic framework firstly produces the initial matched results respectively for classes, properties and instances by the approval voting strategy, and then enhances these results by the mutual cooperating mechanism. Especially, spatial distance and spatial index are introduced to align instances and to improve the performance of aligning class and aligning property. To demonstrate its ability, this holistic framework is tested with two real GSWs. Compared with the state-of-the-art holistic alignment system, namely PARIS, this framework gains a large number of matched pairs. The Fl values of aligning class, aligning property and aligning instance respectively are 0.562, 0.545 and 0.646, all of which are higher than PARIS's.","PeriodicalId":308638,"journal":{"name":"Proceedings of the 10th Workshop on Geographic Information Retrieval","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134225462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The work in this paper is motivated from two different perspectives: First, gazetteers as an important data source for Geographic Information Retrieval (GIR) applications often lack historic place name information. More focused historic gazetteers are a far cry from being complete and often specialize only on certain geographic regions or time periods. Second, research on historic route descriptions---so called itineraries---is an important task in many research disciplines such as geography, linguistics, history, religion, or even medicine. This research on historic itineraries is characterized by manual, time-consuming work with only minimalistic IT support through gazetteers and map services. We address both perspectives and present a depth-first branch-and-bound (DFBnB) algorithm for deducing historic place names and thus the stops of ancient travel routes from itinerary tables. Multiple phonetic and character-based string distances are evaluated when resolving parts of an itinerary first published in 1563.
{"title":"A depth-first branch-and-bound algorithm for geocoding historic itinerary tables","authors":"Daniel Blank, A. Henrich","doi":"10.1145/3003464.3003467","DOIUrl":"https://doi.org/10.1145/3003464.3003467","url":null,"abstract":"The work in this paper is motivated from two different perspectives: First, gazetteers as an important data source for Geographic Information Retrieval (GIR) applications often lack historic place name information. More focused historic gazetteers are a far cry from being complete and often specialize only on certain geographic regions or time periods. Second, research on historic route descriptions---so called itineraries---is an important task in many research disciplines such as geography, linguistics, history, religion, or even medicine. This research on historic itineraries is characterized by manual, time-consuming work with only minimalistic IT support through gazetteers and map services. We address both perspectives and present a depth-first branch-and-bound (DFBnB) algorithm for deducing historic place names and thus the stops of ancient travel routes from itinerary tables. Multiple phonetic and character-based string distances are evaluated when resolving parts of an itinerary first published in 1563.","PeriodicalId":308638,"journal":{"name":"Proceedings of the 10th Workshop on Geographic Information Retrieval","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116608818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kevin A. Sparks, Roger G. Li, Gautam Thakur, R. Stewart, M. Urban
Advances in technology have continually progressed our understanding of where people are, how they use the environment around them, and why they are at their current location. Having a better knowledge of when various locations become popular through space and time could have large impacts on research fields like urban dynamics and energy consumption. In this paper, we discuss the ability to identify and locate various facility types (e.g. restaurant, airport, stadiums) using social media, and assess methods in determining when these facilities become popular over time. We use standard natural language processing tools and machine learning classifiers to interpret geotagged Twitter text and determine if a user is seemingly at a location of interest when the tweet was sent. On average our classifiers are approximately 85% accurate varying across multiple facility types, with a peak precision of 98%. By using these standard methods to classify unstructured text, geotagged social media data can be an extremely useful tool to better understanding the composition of places and how and when people use them.
{"title":"Facility detection and popularity assessment from text classification of social media and crowdsourced data","authors":"Kevin A. Sparks, Roger G. Li, Gautam Thakur, R. Stewart, M. Urban","doi":"10.1145/3003464.3003466","DOIUrl":"https://doi.org/10.1145/3003464.3003466","url":null,"abstract":"Advances in technology have continually progressed our understanding of where people are, how they use the environment around them, and why they are at their current location. Having a better knowledge of when various locations become popular through space and time could have large impacts on research fields like urban dynamics and energy consumption. In this paper, we discuss the ability to identify and locate various facility types (e.g. restaurant, airport, stadiums) using social media, and assess methods in determining when these facilities become popular over time. We use standard natural language processing tools and machine learning classifiers to interpret geotagged Twitter text and determine if a user is seemingly at a location of interest when the tweet was sent. On average our classifiers are approximately 85% accurate varying across multiple facility types, with a peak precision of 98%. By using these standard methods to classify unstructured text, geotagged social media data can be an extremely useful tool to better understanding the composition of places and how and when people use them.","PeriodicalId":308638,"journal":{"name":"Proceedings of the 10th Workshop on Geographic Information Retrieval","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133608790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreas Spitz, Johanna Geiß, Michael Gertz, Stefan Hagedorn, K. Sattler
Events as composites of temporal, spatial and actor information are a central object of interest in many information retrieval (IR) scenarios. There are several challenges to such event-centric IR, which range from the detection and extraction of geographic, temporal and actor mentions in documents to the construction of event descriptions as triples of locations, dates, and actors that can support event query scenarios. For the latter challenge, existing approaches fall short when dealing with imprecise event components. For example, if the exact location or date is unknown, existing IR methods are often unaware of different granularity levels and the conceptual proximity of dates or locations. To address these problems, we present a framework that efficiently answers imprecise event queries, whose geographic or temporal component is given only at a coarse granularity level. Our approach utilizes a network-based event model that includes location, date, and actor components that are extracted from large document collections. Instances of entity and event mentions in the network are weighted based on both their frequency of occurrence and textual distance to reflect semantic relatedness. We demonstrate the utility and flexibility of our approach for evaluating imprecise event queries based on a large collection of events extracted from the English Wikipedia for a ground truth of news events.
{"title":"Refining imprecise spatio-temporal events: a network-based approach","authors":"Andreas Spitz, Johanna Geiß, Michael Gertz, Stefan Hagedorn, K. Sattler","doi":"10.1145/3003464.3003469","DOIUrl":"https://doi.org/10.1145/3003464.3003469","url":null,"abstract":"Events as composites of temporal, spatial and actor information are a central object of interest in many information retrieval (IR) scenarios. There are several challenges to such event-centric IR, which range from the detection and extraction of geographic, temporal and actor mentions in documents to the construction of event descriptions as triples of locations, dates, and actors that can support event query scenarios. For the latter challenge, existing approaches fall short when dealing with imprecise event components. For example, if the exact location or date is unknown, existing IR methods are often unaware of different granularity levels and the conceptual proximity of dates or locations. To address these problems, we present a framework that efficiently answers imprecise event queries, whose geographic or temporal component is given only at a coarse granularity level. Our approach utilizes a network-based event model that includes location, date, and actor components that are extracted from large document collections. Instances of entity and event mentions in the network are weighted based on both their frequency of occurrence and textual distance to reflect semantic relatedness. We demonstrate the utility and flexibility of our approach for evaluating imprecise event queries based on a large collection of events extracted from the English Wikipedia for a ground truth of news events.","PeriodicalId":308638,"journal":{"name":"Proceedings of the 10th Workshop on Geographic Information Retrieval","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115711310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 10th Workshop on Geographic Information Retrieval","authors":"","doi":"10.1145/3003464","DOIUrl":"https://doi.org/10.1145/3003464","url":null,"abstract":"","PeriodicalId":308638,"journal":{"name":"Proceedings of the 10th Workshop on Geographic Information Retrieval","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115843426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}