提高命名实体链接语料库质量

Recent Advances in Natural Language Processing Pub Date : 2019-10-22 DOI:10.26615/978-954-452-056-4_152

A. Weichselbraun, Adrian M. P. Braşoveanu, P. Kuntschik, L. Nixon

{"title":"提高命名实体链接语料库质量","authors":"A. Weichselbraun, Adrian M. P. Braşoveanu, P. Kuntschik, L. Nixon","doi":"10.26615/978-954-452-056-4_152","DOIUrl":null,"url":null,"abstract":"Gold standard corpora and competitive evaluations play a key role in benchmarking named entity linking (NEL) performance and driving the development of more sophisticated NEL systems. The quality of the used corpora and the used evaluation metrics are crucial in this process. We, therefore, assess the quality of three popular evaluation corpora, identifying four major issues which affect these gold standards: (i) the use of different annotation styles, (ii) incorrect and missing annotations, (iii) Knowledge Base evolution, (iv) and differences in annotating co-occurrences. This paper addresses these issues by formalizing NEL annotations and corpus versioning which allows standardizing corpus creation, supports corpus evolution, and paves the way for the use of lenses to automatically transform between different corpus configurations. In addition, the use of clearly defined scoring rules and evaluation metrics ensures a better comparability of evaluation results.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Improving Named Entity Linking Corpora Quality\",\"authors\":\"A. Weichselbraun, Adrian M. P. Braşoveanu, P. Kuntschik, L. Nixon\",\"doi\":\"10.26615/978-954-452-056-4_152\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gold standard corpora and competitive evaluations play a key role in benchmarking named entity linking (NEL) performance and driving the development of more sophisticated NEL systems. The quality of the used corpora and the used evaluation metrics are crucial in this process. We, therefore, assess the quality of three popular evaluation corpora, identifying four major issues which affect these gold standards: (i) the use of different annotation styles, (ii) incorrect and missing annotations, (iii) Knowledge Base evolution, (iv) and differences in annotating co-occurrences. This paper addresses these issues by formalizing NEL annotations and corpus versioning which allows standardizing corpus creation, supports corpus evolution, and paves the way for the use of lenses to automatically transform between different corpus configurations. In addition, the use of clearly defined scoring rules and evaluation metrics ensures a better comparability of evaluation results.\",\"PeriodicalId\":284493,\"journal\":{\"name\":\"Recent Advances in Natural Language Processing\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Recent Advances in Natural Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.26615/978-954-452-056-4_152\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recent Advances in Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26615/978-954-452-056-4_152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

金标准语料库和竞争性评估在命名实体链接(NEL)性能的基准测试和推动更复杂的NEL系统的开发中发挥着关键作用。在这个过程中，使用的语料库的质量和使用的评估指标至关重要。因此，我们评估了三个流行的评价语料库的质量，确定了影响这些金标准的四个主要问题:(i)使用不同的注释风格，(ii)不正确和缺失的注释，(iii)知识库的演变，(iv)注释共现的差异。本文通过形式化NEL注释和语料库版本控制来解决这些问题，这允许标准化语料库创建，支持语料库进化，并为使用透镜在不同语料库配置之间自动转换铺平了道路。此外，使用明确定义的评分规则和评价指标确保了评价结果更好的可比性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Improving Named Entity Linking Corpora Quality

Gold standard corpora and competitive evaluations play a key role in benchmarking named entity linking (NEL) performance and driving the development of more sophisticated NEL systems. The quality of the used corpora and the used evaluation metrics are crucial in this process. We, therefore, assess the quality of three popular evaluation corpora, identifying four major issues which affect these gold standards: (i) the use of different annotation styles, (ii) incorrect and missing annotations, (iii) Knowledge Base evolution, (iv) and differences in annotating co-occurrences. This paper addresses these issues by formalizing NEL annotations and corpus versioning which allows standardizing corpus creation, supports corpus evolution, and paves the way for the use of lenses to automatically transform between different corpus configurations. In addition, the use of clearly defined scoring rules and evaluation metrics ensures a better comparability of evaluation results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Recent Advances in Natural Language Processing

自引率

0.00%

发文量