维基百科式实体标注的多源表示增强

2022 International Joint Conference on Neural Networks (IJCNN) Pub Date : 2022-07-18 DOI:10.1109/IJCNN55064.2022.9892289

Kunyuan Pang, Shasha Li, Jintao Tang, Ting Wang

{"title":"维基百科式实体标注的多源表示增强","authors":"Kunyuan Pang, Shasha Li, Jintao Tang, Ting Wang","doi":"10.1109/IJCNN55064.2022.9892289","DOIUrl":null,"url":null,"abstract":"Entity annotation in Wikipedia (officially named wikilinks) greatly benefits human end-users. Human editors are required to select all mentions that are most helpful to human end-users and link each mention to a Wikipedia page. We aim to design an automatic system to generate Wikipedia-style entity annotation for any plain text. However, existing research either rely heavily on mention-entity map or are restricted to named entities only. Besides, they neglect to select the appropriate mentions as Wikipedia requires. As a result, they leave out some necessary annotation and introduce excessive distracting annotation. Existing benchmarks also skirt around the coverage and selection issues. We propose a new task called Mention Detection and Se-lection for entity annotation, along with a new benchmark, WikiC, to better reflect annotation quality. The task is coined centering mentions specific to each position in high-quality human-annotated examples. We also proposed a new framework, DrWiki, to fulfill the task. We adopt a deep pre-trained span selection model inferring directly from plain text via tokens' context embedding. It can cover all possible spans and avoid limiting to mention-entity maps. In addition, information of both inarguable mention-entity pairs, and mention repeat has been introduced as token-wise representation enhancement by FLAT attention and repeat embedding respectively. Empirical results on WikiC show that, compared with often adopted and state-of-the-art Entity Linking and Entity Recognition methods, our method achieves improvement to previous methods in overall performance. Additional experiments show that DrWiki gains improvement even with a low-coverage mention-entity map.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-source Representation Enhancement for Wikipedia-style Entity Annotation\",\"authors\":\"Kunyuan Pang, Shasha Li, Jintao Tang, Ting Wang\",\"doi\":\"10.1109/IJCNN55064.2022.9892289\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Entity annotation in Wikipedia (officially named wikilinks) greatly benefits human end-users. Human editors are required to select all mentions that are most helpful to human end-users and link each mention to a Wikipedia page. We aim to design an automatic system to generate Wikipedia-style entity annotation for any plain text. However, existing research either rely heavily on mention-entity map or are restricted to named entities only. Besides, they neglect to select the appropriate mentions as Wikipedia requires. As a result, they leave out some necessary annotation and introduce excessive distracting annotation. Existing benchmarks also skirt around the coverage and selection issues. We propose a new task called Mention Detection and Se-lection for entity annotation, along with a new benchmark, WikiC, to better reflect annotation quality. The task is coined centering mentions specific to each position in high-quality human-annotated examples. We also proposed a new framework, DrWiki, to fulfill the task. We adopt a deep pre-trained span selection model inferring directly from plain text via tokens' context embedding. It can cover all possible spans and avoid limiting to mention-entity maps. In addition, information of both inarguable mention-entity pairs, and mention repeat has been introduced as token-wise representation enhancement by FLAT attention and repeat embedding respectively. Empirical results on WikiC show that, compared with often adopted and state-of-the-art Entity Linking and Entity Recognition methods, our method achieves improvement to previous methods in overall performance. Additional experiments show that DrWiki gains improvement even with a low-coverage mention-entity map.\",\"PeriodicalId\":106974,\"journal\":{\"name\":\"2022 International Joint Conference on Neural Networks (IJCNN)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Joint Conference on Neural Networks (IJCNN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN55064.2022.9892289\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN55064.2022.9892289","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

维基百科中的实体注释(官方命名为wikilinks)极大地造福了人类最终用户。人类编辑需要选择所有对人类最终用户最有帮助的提及，并将每个提及链接到维基百科页面。我们的目标是设计一个自动系统，为任何纯文本生成维基百科风格的实体注释。然而，现有的研究要么严重依赖于提及实体图，要么仅限于命名实体。此外，他们忽略了按照维基百科的要求选择适当的提及。因此，他们省略了一些必要的注释，并引入了过多的分散注意力的注释。现有的基准也绕过了覆盖范围和选择问题。为了更好地反映标注质量，我们提出了一个名为提及检测和选择的实体标注任务，以及一个新的基准WikiC。该任务是在高质量的人工注释示例中对特定于每个位置的提及进行集中。我们还提出了一个新的框架，DrWiki，来完成这个任务。我们采用深度预训练的跨度选择模型，通过标记的上下文嵌入直接从纯文本推断。它可以覆盖所有可能的跨度，避免局限于提及实体映射。此外，引入了无可争议的提及实体对信息和提及重复信息，分别通过FLAT关注和重复嵌入作为标记智能表示增强。WikiC上的实证结果表明，与常用的实体链接和实体识别方法相比，我们的方法在整体性能上比以前的方法有所提高。额外的实验表明，即使使用低覆盖率的提及实体图，DrWiki也能获得改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Multi-source Representation Enhancement for Wikipedia-style Entity Annotation

Entity annotation in Wikipedia (officially named wikilinks) greatly benefits human end-users. Human editors are required to select all mentions that are most helpful to human end-users and link each mention to a Wikipedia page. We aim to design an automatic system to generate Wikipedia-style entity annotation for any plain text. However, existing research either rely heavily on mention-entity map or are restricted to named entities only. Besides, they neglect to select the appropriate mentions as Wikipedia requires. As a result, they leave out some necessary annotation and introduce excessive distracting annotation. Existing benchmarks also skirt around the coverage and selection issues. We propose a new task called Mention Detection and Se-lection for entity annotation, along with a new benchmark, WikiC, to better reflect annotation quality. The task is coined centering mentions specific to each position in high-quality human-annotated examples. We also proposed a new framework, DrWiki, to fulfill the task. We adopt a deep pre-trained span selection model inferring directly from plain text via tokens' context embedding. It can cover all possible spans and avoid limiting to mention-entity maps. In addition, information of both inarguable mention-entity pairs, and mention repeat has been introduced as token-wise representation enhancement by FLAT attention and repeat embedding respectively. Empirical results on WikiC show that, compared with often adopted and state-of-the-art Entity Linking and Entity Recognition methods, our method achieves improvement to previous methods in overall performance. Additional experiments show that DrWiki gains improvement even with a low-coverage mention-entity map.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 International Joint Conference on Neural Networks (IJCNN)

自引率

0.00%

发文量