首页 > 最新文献

Proceedings of the 8th Workshop on Geographic Information Retrieval最新文献

英文 中文
Construction and first analysis of a corpus for the evaluation and training of microblog/twitter geoparsers 微博/推特地质工作者评价与培训语料库的构建与初步分析
Pub Date : 2014-11-04 DOI: 10.1145/2675354.2675701
J. O. Wallgrün, F. Hardisty, A. MacEachren, M. Karimzadeh, Yiting Ju, Scott Pezanowski
This article presents an approach to place reference corpus building and application of the approach to a Geo-Microblog Corpus that will foster research and development in the areas of microblog/twitter geoparsing and geographic information retrieval. Our corpus currently consists of 6000 tweets with identified and georeferenced place names. 30% of the tweets contain at least one place name. The corpus is intended to support the evaluation, comparison, and training of geoparsers. We introduce our corpus building framework, which is developed to be generally applicable beyond microblogs, and explain how we use crowdsourcing and geovisual analytics technology to support the construction of relatively large corpora. We then report on the corpus building work and present an analysis of causes of disagreement between the lay persons performing place identification in our crowdsourcing approach.
本文提出了一种地点参考语料库的构建方法,并将该方法应用于地理微博语料库,这将促进微博/推特地理分析和地理信息检索领域的研究和发展。我们的语料库目前包含6000条带有已识别和地理参考的地名的推文。30%的推文至少包含一个地名。该语料库的目的是支持地质工作者的评价、比较和培训。我们介绍了我们的语料库构建框架,该框架是为了在微博之外普遍适用而开发的,并解释了我们如何使用众包和地理可视化分析技术来支持相对大型语料库的构建。然后,我们报告了语料库建设工作,并分析了在我们的众包方法中执行地点识别的非专业人员之间存在分歧的原因。
{"title":"Construction and first analysis of a corpus for the evaluation and training of microblog/twitter geoparsers","authors":"J. O. Wallgrün, F. Hardisty, A. MacEachren, M. Karimzadeh, Yiting Ju, Scott Pezanowski","doi":"10.1145/2675354.2675701","DOIUrl":"https://doi.org/10.1145/2675354.2675701","url":null,"abstract":"This article presents an approach to place reference corpus building and application of the approach to a Geo-Microblog Corpus that will foster research and development in the areas of microblog/twitter geoparsing and geographic information retrieval. Our corpus currently consists of 6000 tweets with identified and georeferenced place names. 30% of the tweets contain at least one place name. The corpus is intended to support the evaluation, comparison, and training of geoparsers. We introduce our corpus building framework, which is developed to be generally applicable beyond microblogs, and explain how we use crowdsourcing and geovisual analytics technology to support the construction of relatively large corpora. We then report on the corpus building work and present an analysis of causes of disagreement between the lay persons performing place identification in our crowdsourcing approach.","PeriodicalId":286892,"journal":{"name":"Proceedings of the 8th Workshop on Geographic Information Retrieval","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114088542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Itinerary retrieval: travelers, like traveling salesmen, prefer efficient routes 行程检索:旅行者和旅行推销员一样,更喜欢高效的路线
Pub Date : 2014-11-04 DOI: 10.1145/2675354.2675355
M. Adelfio, H. Samet
Internet users share large quantities of text and multimedia content that becomes easily accessible to others via hyperlinks and search engine results. However, structured datasets generally lack this level of exposure. One example is the travel itinerary, which many Internet users post online in the form of a spreadsheet or web page table, yet the collection of such itineraries remains difficult to search or browse due to insufficient parsing and indexing by search engines. Enabling interaction with user-uploaded itineraries could provide valuable information to trip planners who are researching travel options and to businesses attempting to understand travel patterns. This work examines the challenges of identifying and extracting itineraries from spreadsheets and web page tables to support such applications, with a focus on differentiating between itineraries and other documents with geographic content.
互联网用户共享大量文本和多媒体内容,其他人可以通过超链接和搜索引擎结果轻松访问这些内容。然而,结构化数据集通常缺乏这种程度的暴露。其中一个例子是旅行日程,许多互联网用户以电子表格或网页表格的形式在网上发布,但由于搜索引擎的解析和索引不足,这些行程的集合仍然难以搜索或浏览。允许与用户上传的行程进行交互,可以为正在研究旅行选择的旅行计划者和试图了解旅行模式的企业提供有价值的信息。这项工作考察了从电子表格和网页表中识别和提取行程以支持此类应用程序的挑战,重点是区分行程和其他具有地理内容的文档。
{"title":"Itinerary retrieval: travelers, like traveling salesmen, prefer efficient routes","authors":"M. Adelfio, H. Samet","doi":"10.1145/2675354.2675355","DOIUrl":"https://doi.org/10.1145/2675354.2675355","url":null,"abstract":"Internet users share large quantities of text and multimedia content that becomes easily accessible to others via hyperlinks and search engine results. However, structured datasets generally lack this level of exposure. One example is the travel itinerary, which many Internet users post online in the form of a spreadsheet or web page table, yet the collection of such itineraries remains difficult to search or browse due to insufficient parsing and indexing by search engines. Enabling interaction with user-uploaded itineraries could provide valuable information to trip planners who are researching travel options and to businesses attempting to understand travel patterns. This work examines the challenges of identifying and extracting itineraries from spreadsheets and web page tables to support such applications, with a focus on differentiating between itineraries and other documents with geographic content.","PeriodicalId":286892,"journal":{"name":"Proceedings of the 8th Workshop on Geographic Information Retrieval","volume":"84 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131457356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Integration of linked data sources for gazetteer expansion 整合链接数据源的地名辞典扩展
Pub Date : 2014-11-04 DOI: 10.1145/2675354.2675357
T. Moura, C. Davis
The determination of the geographic scope of documents is important for many applications in geographic information retrieval (GIR). Many techniques require the use of gazetteers as a source of reference data. However, creating and maintaining gazetteers is still a complex and demanding task. We propose using linked data sources to put together gazetteer data that can be both broad (e.g. planetary) and deep (e.g., down to urban detail). Linked data sources also allow enriching the resulting gazetteer with a set of geographic and semantic relationships involving place names and other geographic and non-geographic terms, thus expanding the possibilities for solving typical GIR problems such as disambiguation and filtering. This work shows the results of efforts to combine two linked data sources of gazetteer data, namely GeoNames and DBPedia, to populate an integrated and semantically-enriched gazetteer. We used evidence contained in attributes, such as Wikipedia URLs, Linked Data predicates that indicate that places in both sources are the same, and some additional criteria. The resulting gazetteer contains 8,729,833 places, of which 426;317 are found in both data sources. This relatively small overlap is analyzed, indicating that GeoNames and DBPedia are complementary, covering typically different classes of places, thus leading to the idea that further expansion can be achieved by integrating gazetteer data from additional Linked Data sources.
在地理信息检索(GIR)的许多应用中,确定文献的地理范围是非常重要的。许多技术需要使用地名词典作为参考数据的来源。然而,创建和维护地名词典仍然是一项复杂而艰巨的任务。我们建议使用链接的数据源将地名词典数据放在一起,这些数据既可以是广泛的(例如,行星),也可以是深入的(例如,到城市细节)。链接数据源还允许使用一组涉及地名和其他地理和非地理术语的地理和语义关系来丰富生成的地名词典,从而扩展了解决典型GIR问题(如消歧和过滤)的可能性。这项工作展示了将两个相关联的地名词典数据源(即GeoNames和DBPedia)结合起来,以填充一个集成的、语义丰富的地名词典的结果。我们使用属性中包含的证据,例如Wikipedia url、表明两个源中的位置相同的关联数据谓词,以及一些附加标准。由此产生的地名词典包含8,729,833个地名,其中426,317个地名同时存在于两个数据源中。对这种相对较小的重叠进行了分析,表明GeoNames和DBPedia是互补的,覆盖了通常不同类别的地方,从而产生了通过集成来自其他关联数据源的地名词典数据可以实现进一步扩展的想法。
{"title":"Integration of linked data sources for gazetteer expansion","authors":"T. Moura, C. Davis","doi":"10.1145/2675354.2675357","DOIUrl":"https://doi.org/10.1145/2675354.2675357","url":null,"abstract":"The determination of the geographic scope of documents is important for many applications in geographic information retrieval (GIR). Many techniques require the use of gazetteers as a source of reference data. However, creating and maintaining gazetteers is still a complex and demanding task. We propose using linked data sources to put together gazetteer data that can be both broad (e.g. planetary) and deep (e.g., down to urban detail). Linked data sources also allow enriching the resulting gazetteer with a set of geographic and semantic relationships involving place names and other geographic and non-geographic terms, thus expanding the possibilities for solving typical GIR problems such as disambiguation and filtering. This work shows the results of efforts to combine two linked data sources of gazetteer data, namely GeoNames and DBPedia, to populate an integrated and semantically-enriched gazetteer. We used evidence contained in attributes, such as Wikipedia URLs, Linked Data predicates that indicate that places in both sources are the same, and some additional criteria. The resulting gazetteer contains 8,729,833 places, of which 426;317 are found in both data sources. This relatively small overlap is analyzed, indicating that GeoNames and DBPedia are complementary, covering typically different classes of places, thus leading to the idea that further expansion can be achieved by integrating gazetteer data from additional Linked Data sources.","PeriodicalId":286892,"journal":{"name":"Proceedings of the 8th Workshop on Geographic Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131218112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Indirect location recommendation 间接位置推荐
Pub Date : 2014-11-04 DOI: 10.1145/2675354.2675697
André Sabino, A. Rodrigues
Recommending interesting locations to users is a challenge for social and productive networks. The evidence of the content produced by users must be considered in this task, which may be simplified by the use of the meta-data associated with the content, i.e., the categorization supported by the network -- descriptive keywords and geographic coordinates. In this paper we present an extension to a productive network representation model, originally designed to discover indirect keywords. Our extension adds a spatial dimension to the information that represents the user production, enabling indirect location discovery methods through the interpretation of the network as a graph, solely relying on keywords and locations that categorize or describe productive items. The model and indirect location discovery methods presented in this paper avoid content analysis, and are a new step towards a generic approach to the identification of relevant information, otherwise hidden from the users. The evaluation of the model extension and methods is accomplished by an experiment that performs a classification analysis over the Twitter network. The results show that we can efficiently recommend locations to users.
向用户推荐有趣的地点对社交和生产网络来说是一个挑战。在这项任务中必须考虑用户产生的内容的证据,这可以通过使用与内容相关的元数据来简化,即网络支持的分类——描述性关键字和地理坐标。在本文中,我们提出了一个生产网络表示模型的扩展,该模型最初设计用于发现间接关键字。我们的扩展为代表用户生产的信息增加了一个空间维度,通过将网络解释为图形来实现间接的位置发现方法,仅依赖对生产项目进行分类或描述的关键字和位置。本文提出的模型和间接位置发现方法避免了对内容的分析,是向识别相关信息的通用方法迈出的新一步,否则将对用户隐藏。通过对Twitter网络进行分类分析的实验,对模型扩展和方法进行了评价。结果表明,我们可以有效地向用户推荐位置。
{"title":"Indirect location recommendation","authors":"André Sabino, A. Rodrigues","doi":"10.1145/2675354.2675697","DOIUrl":"https://doi.org/10.1145/2675354.2675697","url":null,"abstract":"Recommending interesting locations to users is a challenge for social and productive networks. The evidence of the content produced by users must be considered in this task, which may be simplified by the use of the meta-data associated with the content, i.e., the categorization supported by the network -- descriptive keywords and geographic coordinates. In this paper we present an extension to a productive network representation model, originally designed to discover indirect keywords. Our extension adds a spatial dimension to the information that represents the user production, enabling indirect location discovery methods through the interpretation of the network as a graph, solely relying on keywords and locations that categorize or describe productive items. The model and indirect location discovery methods presented in this paper avoid content analysis, and are a new step towards a generic approach to the identification of relevant information, otherwise hidden from the users. The evaluation of the model extension and methods is accomplished by an experiment that performs a classification analysis over the Twitter network. The results show that we can efficiently recommend locations to users.","PeriodicalId":286892,"journal":{"name":"Proceedings of the 8th Workshop on Geographic Information Retrieval","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114624098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Testing a model of witness accounts in social media 在社交媒体上测试证人账户模型
Pub Date : 2014-11-04 DOI: 10.1145/2675354.2675699
M. Truelove, M. Vasardani, S. Winter
Identifying micro-bloggers who are likely witnesses to events is beneficial in numerous applications, including event detection and credibility assessment. This paper presents research in-progress on testing of a conceptual model, which defines witness and related accounts from micro-blogs about events. The case study events considered have varying spatial and temporal characteristics, and include a shark sighting, a music concert, a protest, and a cyclone. Results indicate that witnessing characteristics are influenced by numerous factors in addition to the spatial and temporal characteristics of the events, including the motivation of the witnesses themselves. Additionally, the results suggest enhancements to the conceptual model to provide a more sophisticated generic implementation, and insights for future automation approaches.
识别可能是事件目击者的微博用户在许多应用中都是有益的,包括事件检测和可信度评估。本文介绍了一项正在进行的概念模型测试研究,该模型定义了微博事件的目击者和相关账户。考虑的案例研究事件具有不同的空间和时间特征,包括鲨鱼目击,音乐会,抗议和飓风。结果表明,目击特征除了受事件时空特征的影响外,还受证人自身动机等诸多因素的影响。此外,结果建议对概念模型进行增强,以提供更复杂的通用实现,以及对未来自动化方法的见解。
{"title":"Testing a model of witness accounts in social media","authors":"M. Truelove, M. Vasardani, S. Winter","doi":"10.1145/2675354.2675699","DOIUrl":"https://doi.org/10.1145/2675354.2675699","url":null,"abstract":"Identifying micro-bloggers who are likely witnesses to events is beneficial in numerous applications, including event detection and credibility assessment. This paper presents research in-progress on testing of a conceptual model, which defines witness and related accounts from micro-blogs about events. The case study events considered have varying spatial and temporal characteristics, and include a shark sighting, a music concert, a protest, and a cyclone. Results indicate that witnessing characteristics are influenced by numerous factors in addition to the spatial and temporal characteristics of the events, including the motivation of the witnesses themselves. Additionally, the results suggest enhancements to the conceptual model to provide a more sophisticated generic implementation, and insights for future automation approaches.","PeriodicalId":286892,"journal":{"name":"Proceedings of the 8th Workshop on Geographic Information Retrieval","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128689361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Estimating the semantic type of events using location features from Flickr 使用Flickr的位置特征估计事件的语义类型
Pub Date : 2014-11-04 DOI: 10.1145/2675354.2675700
Steven Van Canneyt, S. Schockaert, B. Dhoedt
Various methods for automatically detecting events from social media have been developed in recent years. However, little progress has been made towards extracting structured representations of such events, which severely limits the way in which the resulting event databases can be queried. As a first step to address this issue, we focus on the problem of discovering the semantic type of events. While current methods are almost exclusively based on bag-of-words methods, we show that additionally using location features can substantially improve the results. In particular, we use the tags associated with Flickr photos and the types of the known events near the venue of the event as context information.
近年来,人们开发了各种自动检测社交媒体事件的方法。然而,在提取这些事件的结构化表示方面进展甚微,这严重限制了查询结果事件数据库的方式。作为解决这个问题的第一步,我们重点关注发现事件的语义类型的问题。虽然目前的方法几乎完全基于词袋方法,但我们表明,额外使用位置特征可以大大改善结果。特别是,我们使用与Flickr照片相关联的标记和事件地点附近已知事件的类型作为上下文信息。
{"title":"Estimating the semantic type of events using location features from Flickr","authors":"Steven Van Canneyt, S. Schockaert, B. Dhoedt","doi":"10.1145/2675354.2675700","DOIUrl":"https://doi.org/10.1145/2675354.2675700","url":null,"abstract":"Various methods for automatically detecting events from social media have been developed in recent years. However, little progress has been made towards extracting structured representations of such events, which severely limits the way in which the resulting event databases can be queried. As a first step to address this issue, we focus on the problem of discovering the semantic type of events. While current methods are almost exclusively based on bag-of-words methods, we show that additionally using location features can substantially improve the results. In particular, we use the tags associated with Flickr photos and the types of the known events near the venue of the event as context information.","PeriodicalId":286892,"journal":{"name":"Proceedings of the 8th Workshop on Geographic Information Retrieval","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122495733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
What, where, and when: keyword search with spatio-temporal ranges 什么,在哪里,什么时候:具有时空范围的关键字搜索
Pub Date : 2014-11-04 DOI: 10.1145/2675354.2675358
Sergey Nepomnyachiy, Bluma S. Gelley, Wei Jiang, Tehila Minkus
With the adoption of timestamps and geotags on Web data, search engines are increasingly being asked questions of "where" and "when" in addition to the classic "what." In the case of Twitter, many tweets are tagged with location information as well as timestamps, creating a demand for query processors that can search both of these dimensions along with text. We propose 3W, a search framework for geo-temporal stamped documents. It exploits the structure of time-stamped data to dramatically shrink the temporal search space and uses a shallow tree based on the spatial distribution of tweets to allow speedy search over the spatial and text dimensions. Our evaluation on 30 million tweets shows that the prototype system outperforms the baseline approach that uses a monolithic index.
随着时间戳和地理标签在网络数据上的应用,搜索引擎越来越多地被问到“在哪里”和“什么时候”,而不仅仅是经典的“什么”。在Twitter的情况下,许多tweet都带有位置信息和时间戳的标记,这就需要能够搜索这两个维度以及文本的查询处理器。我们提出了3W,一个地理时间戳文档的搜索框架。它利用时间戳数据的结构来显著缩小时间搜索空间,并使用基于tweet的空间分布的浅树来允许在空间和文本维度上进行快速搜索。我们对3000万条tweet的评估表明,原型系统优于使用单一索引的基线方法。
{"title":"What, where, and when: keyword search with spatio-temporal ranges","authors":"Sergey Nepomnyachiy, Bluma S. Gelley, Wei Jiang, Tehila Minkus","doi":"10.1145/2675354.2675358","DOIUrl":"https://doi.org/10.1145/2675354.2675358","url":null,"abstract":"With the adoption of timestamps and geotags on Web data, search engines are increasingly being asked questions of \"where\" and \"when\" in addition to the classic \"what.\" In the case of Twitter, many tweets are tagged with location information as well as timestamps, creating a demand for query processors that can search both of these dimensions along with text. We propose 3W, a search framework for geo-temporal stamped documents. It exploits the structure of time-stamped data to dramatically shrink the temporal search space and uses a shallow tree based on the spatial distribution of tweets to allow speedy search over the spatial and text dimensions. Our evaluation on 30 million tweets shows that the prototype system outperforms the baseline approach that uses a monolithic index.","PeriodicalId":286892,"journal":{"name":"Proceedings of the 8th Workshop on Geographic Information Retrieval","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124473773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Building a corpus of spatial relational expressions extracted from web documents 构建从web文档中提取的空间关系表达式的语料库
Pub Date : 2014-11-04 DOI: 10.1145/2675354.2675702
J. O. Wallgrün, A. Klippel, Timothy Baldwin
Spatial language, despite decades of research, still poses substantial challenges for automated systems, for instance in geographic information retrieval or human-robot interaction. We describe an approach to building a corpus of natural language expressions extracted from web documents for analyzing and modeling spatial relational expressions (SRE). The unique characteristic of this corpus is that it is built around georeferenced triplets, with each triplet containing two entities (including their latitude/longitude coordinates) related by a spatial expression such as near. While the approach is still experimental, our first results are promising, in that we believe they will form the foundation for a comprehensive contextualized model for interpreting spatial natural language expressions. For the time being, we are focusing on a single domain, hotel reviews. This domain restriction allowed us to implement a proof-of-concept that this approach, with advances in natural language technologies, will indeed deliver a comprehensive corpus. The potential to collect larger corpora, and associated challenges, is discussed.
空间语言,尽管经过了几十年的研究,仍然给自动化系统带来了巨大的挑战,例如地理信息检索或人机交互。我们描述了一种从web文档中提取自然语言表达式的语料库,用于分析和建模空间关系表达式(SRE)的方法。这个语料库的独特之处在于它是围绕地理引用三元组构建的,每个三元组包含两个实体(包括它们的纬度/经度坐标),它们由一个空间表达式(如near)相关。虽然该方法仍处于实验阶段,但我们的第一个结果是有希望的,因为我们相信它们将为解释空间自然语言表达的综合情境化模型奠定基础。目前,我们专注于酒店评论这一单一领域。这个领域限制允许我们实现一个概念证明,即这种方法,随着自然语言技术的进步,将确实提供一个全面的语料库。讨论了收集更大语料库的潜力以及相关的挑战。
{"title":"Building a corpus of spatial relational expressions extracted from web documents","authors":"J. O. Wallgrün, A. Klippel, Timothy Baldwin","doi":"10.1145/2675354.2675702","DOIUrl":"https://doi.org/10.1145/2675354.2675702","url":null,"abstract":"Spatial language, despite decades of research, still poses substantial challenges for automated systems, for instance in geographic information retrieval or human-robot interaction. We describe an approach to building a corpus of natural language expressions extracted from web documents for analyzing and modeling spatial relational expressions (SRE). The unique characteristic of this corpus is that it is built around georeferenced triplets, with each triplet containing two entities (including their latitude/longitude coordinates) related by a spatial expression such as near. While the approach is still experimental, our first results are promising, in that we believe they will form the foundation for a comprehensive contextualized model for interpreting spatial natural language expressions. For the time being, we are focusing on a single domain, hotel reviews. This domain restriction allowed us to implement a proof-of-concept that this approach, with advances in natural language technologies, will indeed deliver a comprehensive corpus. The potential to collect larger corpora, and associated challenges, is discussed.","PeriodicalId":286892,"journal":{"name":"Proceedings of the 8th Workshop on Geographic Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129837640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Characterization of toponym usages in texts 文本中地名用法的表征
Pub Date : 2014-11-04 DOI: 10.1145/2675354.2675703
S. Wolf, A. Henrich, Daniel Blank
Toponyms in texts and search queries are often used figuratively and do not directly refer to the locations they reference in their literal sense. Different usage kinds and stylistic devices characterize toponym usages in texts. It is thus crucial for a Geographic Information Retrieval (GIR) system to precisely distinguish these different toponym usages at indexing and at query time in order to best address a given information need and the geospatial footprint of a document. For that purpose, we analyze which of the classic stylistic devices such as allegories, metaphors, or metonymies are used together with toponyms. We use these categories as a foundation for a systematic approach towards the characterization of toponym usages in texts which we believe is necessary to further boost retrieval effectiveness of future GIR systems. A prototype implements this characterization exemplary for texts written in German. We evaluate the effectiveness of our approach against a reference corpus to show the general feasibility. Our approach provides a basis for a wide range of more sophisticated applications such as for example text genre detection.
文本和搜索查询中的地名通常用作比喻,并不直接指它们在字面意义上所指的位置。不同的用法类型和文体手段决定了语篇中地名的用法。因此,地理信息检索(GIR)系统在索引和查询时精确区分这些不同的地名用法是至关重要的,以便最好地处理给定的信息需求和文档的地理空间足迹。为此,我们分析了哪些经典的文体手段,如寓言、隐喻或转喻与地名一起使用。我们使用这些分类作为对文本中地名用法的系统表征方法的基础,我们认为这对于进一步提高未来GIR系统的检索效率是必要的。一个原型实现了这种特征,作为德语文本的范例。我们针对参考语料库评估我们的方法的有效性,以显示总体可行性。我们的方法为许多更复杂的应用提供了基础,例如文本类型检测。
{"title":"Characterization of toponym usages in texts","authors":"S. Wolf, A. Henrich, Daniel Blank","doi":"10.1145/2675354.2675703","DOIUrl":"https://doi.org/10.1145/2675354.2675703","url":null,"abstract":"Toponyms in texts and search queries are often used figuratively and do not directly refer to the locations they reference in their literal sense. Different usage kinds and stylistic devices characterize toponym usages in texts. It is thus crucial for a Geographic Information Retrieval (GIR) system to precisely distinguish these different toponym usages at indexing and at query time in order to best address a given information need and the geospatial footprint of a document. For that purpose, we analyze which of the classic stylistic devices such as allegories, metaphors, or metonymies are used together with toponyms. We use these categories as a foundation for a systematic approach towards the characterization of toponym usages in texts which we believe is necessary to further boost retrieval effectiveness of future GIR systems. A prototype implements this characterization exemplary for texts written in German. We evaluate the effectiveness of our approach against a reference corpus to show the general feasibility. Our approach provides a basis for a wide range of more sophisticated applications such as for example text genre detection.","PeriodicalId":286892,"journal":{"name":"Proceedings of the 8th Workshop on Geographic Information Retrieval","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122649468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Using minimaps to enable toponym resolution with an effective 100% rate of recall 使用小地图以100%的召回率实现地名解析
Pub Date : 2014-11-04 DOI: 10.1145/2675354.2675698
H. Samet
A number of systems have been recently constructed that make use of a map query interface to access documents by the locations that they mention. These mentions are often ambiguous in the sense that many interpretations exist for the locations which are not always expressed along with all the necessary qualifiers. In other words, users are assumed to be able to make the appropriate identification based either on knowledge of prior queries or the nature of the document containing the references as well as knowledge of the target audience. The disambiguation process is known as toponym resolution. The map query interface results in the placement of icons and links to the appropriate documents at the corresponding location on the map. Assuming that all toponyms have been recognized (i.e., 100% rate of recall for toponym recognition), it is shown how to achieve an effective 100% rate of recall for toponym resolution for all interpretations of a toponym that the toponym recognition process associates with at least one document. This is done with the aid of a minimap that shows all of these interpretations which means that a user has access to all documents that mention a specific location as long as the textual specification to the location has been recognized as a location rather than as the name of another entity such as a person, company, organization, etc. It also assumes that the user is capable of determining the correct interpretation of each toponym. This is important as it enables the determination of precision and recall.
最近已经构建了许多系统,它们利用地图查询接口按文档所提到的位置访问文档。这些提及通常是模棱两可的,因为存在许多对位置的解释,这些解释并不总是与所有必要的限定词一起表示。换句话说,假定用户能够根据先前查询的知识或包含引用的文档的性质以及目标受众的知识进行适当的识别。消歧过程称为地名解析。地图查询接口会在地图的相应位置放置图标和相应文档的链接。假设所有的地名都已被识别(即,100%的地名识别召回率),展示了如何对地名识别过程与至少一个文档相关联的地名的所有解释实现有效的100%的地名解析召回率。这是在一个显示所有这些解释的小地图的帮助下完成的,这意味着用户可以访问提到特定位置的所有文档,只要该位置的文本规范已被识别为位置而不是另一个实体(如个人,公司,组织等)的名称。它还假定用户能够确定每个地名的正确解释。这很重要,因为它可以确定准确性和召回率。
{"title":"Using minimaps to enable toponym resolution with an effective 100% rate of recall","authors":"H. Samet","doi":"10.1145/2675354.2675698","DOIUrl":"https://doi.org/10.1145/2675354.2675698","url":null,"abstract":"A number of systems have been recently constructed that make use of a map query interface to access documents by the locations that they mention. These mentions are often ambiguous in the sense that many interpretations exist for the locations which are not always expressed along with all the necessary qualifiers. In other words, users are assumed to be able to make the appropriate identification based either on knowledge of prior queries or the nature of the document containing the references as well as knowledge of the target audience. The disambiguation process is known as toponym resolution. The map query interface results in the placement of icons and links to the appropriate documents at the corresponding location on the map. Assuming that all toponyms have been recognized (i.e., 100% rate of recall for toponym recognition), it is shown how to achieve an effective 100% rate of recall for toponym resolution for all interpretations of a toponym that the toponym recognition process associates with at least one document. This is done with the aid of a minimap that shows all of these interpretations which means that a user has access to all documents that mention a specific location as long as the textual specification to the location has been recognized as a location rather than as the name of another entity such as a person, company, organization, etc. It also assumes that the user is capable of determining the correct interpretation of each toponym. This is important as it enables the determination of precision and recall.","PeriodicalId":286892,"journal":{"name":"Proceedings of the 8th Workshop on Geographic Information Retrieval","volume":"366 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122769345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
期刊
Proceedings of the 8th Workshop on Geographic Information Retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1