Carmen Brando, Catherine Dominguès, Magali Capeyron
{"title":"Evaluation of NER systems for the recognition of place mentions in French thematic corpora","authors":"Carmen Brando, Catherine Dominguès, Magali Capeyron","doi":"10.1145/3003464.3003471","DOIUrl":null,"url":null,"abstract":"Ongoing initiatives promoted by cultural institutions and public administrations engage in the development of textual corpora issued from the general public. In this work, we deal with a spoken corpus of life stories and a crowd-sourced Web corpus of people's contributions related to urban planning issues in their city. Located information constitutes an essential component in these corpora. Toponyms refer to official names (e.g. Congo) which are listed in gazetteers but often to generic locations such as un endroit très beau (a beautiful place). Because of the nature of the corpora, these generic locations are inherently subjective, vague and descriptive. For enabling automated exploitation of these texts, it is crucial to properly detect such kinds of place mentions. In this sense, the present work provides a comparative study of state-of-art NER1 systems, most importantly of supervised tools such as Stanford NER, for the identification of generic locations in thematic corpora.","PeriodicalId":308638,"journal":{"name":"Proceedings of the 10th Workshop on Geographic Information Retrieval","volume":"21 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th Workshop on Geographic Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3003464.3003471","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Ongoing initiatives promoted by cultural institutions and public administrations engage in the development of textual corpora issued from the general public. In this work, we deal with a spoken corpus of life stories and a crowd-sourced Web corpus of people's contributions related to urban planning issues in their city. Located information constitutes an essential component in these corpora. Toponyms refer to official names (e.g. Congo) which are listed in gazetteers but often to generic locations such as un endroit très beau (a beautiful place). Because of the nature of the corpora, these generic locations are inherently subjective, vague and descriptive. For enabling automated exploitation of these texts, it is crucial to properly detect such kinds of place mentions. In this sense, the present work provides a comparative study of state-of-art NER1 systems, most importantly of supervised tools such as Stanford NER, for the identification of generic locations in thematic corpora.
文化机构和公共行政部门正在推动的举措是开发公众发布的文本语料库。在这项工作中,我们处理了生活故事的口语语料库和人们对其城市规划问题的贡献的众包网络语料库。定位信息是这些语料库的重要组成部分。地名指的是在地名词典中列出的官方名称(如刚果),但通常指的是一般的地点,如unendroit tr s beau(一个美丽的地方)。由于语料库的性质,这些通用位置本质上是主观的、模糊的和描述性的。为了实现对这些文本的自动利用,正确检测这类地点提及是至关重要的。从这个意义上说,本研究提供了对最先进的NER1系统的比较研究,最重要的是斯坦福NER等监督工具,用于识别主题语料库中的通用位置。