{"title":"维基百科式实体标注的多源表示增强","authors":"Kunyuan Pang, Shasha Li, Jintao Tang, Ting Wang","doi":"10.1109/IJCNN55064.2022.9892289","DOIUrl":null,"url":null,"abstract":"Entity annotation in Wikipedia (officially named wikilinks) greatly benefits human end-users. Human editors are required to select all mentions that are most helpful to human end-users and link each mention to a Wikipedia page. We aim to design an automatic system to generate Wikipedia-style entity annotation for any plain text. However, existing research either rely heavily on mention-entity map or are restricted to named entities only. Besides, they neglect to select the appropriate mentions as Wikipedia requires. As a result, they leave out some necessary annotation and introduce excessive distracting annotation. Existing benchmarks also skirt around the coverage and selection issues. We propose a new task called Mention Detection and Se-lection for entity annotation, along with a new benchmark, WikiC, to better reflect annotation quality. The task is coined centering mentions specific to each position in high-quality human-annotated examples. We also proposed a new framework, DrWiki, to fulfill the task. We adopt a deep pre-trained span selection model inferring directly from plain text via tokens' context embedding. It can cover all possible spans and avoid limiting to mention-entity maps. In addition, information of both inarguable mention-entity pairs, and mention repeat has been introduced as token-wise representation enhancement by FLAT attention and repeat embedding respectively. Empirical results on WikiC show that, compared with often adopted and state-of-the-art Entity Linking and Entity Recognition methods, our method achieves improvement to previous methods in overall performance. Additional experiments show that DrWiki gains improvement even with a low-coverage mention-entity map.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-source Representation Enhancement for Wikipedia-style Entity Annotation\",\"authors\":\"Kunyuan Pang, Shasha Li, Jintao Tang, Ting Wang\",\"doi\":\"10.1109/IJCNN55064.2022.9892289\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Entity annotation in Wikipedia (officially named wikilinks) greatly benefits human end-users. Human editors are required to select all mentions that are most helpful to human end-users and link each mention to a Wikipedia page. We aim to design an automatic system to generate Wikipedia-style entity annotation for any plain text. However, existing research either rely heavily on mention-entity map or are restricted to named entities only. Besides, they neglect to select the appropriate mentions as Wikipedia requires. As a result, they leave out some necessary annotation and introduce excessive distracting annotation. Existing benchmarks also skirt around the coverage and selection issues. We propose a new task called Mention Detection and Se-lection for entity annotation, along with a new benchmark, WikiC, to better reflect annotation quality. The task is coined centering mentions specific to each position in high-quality human-annotated examples. We also proposed a new framework, DrWiki, to fulfill the task. We adopt a deep pre-trained span selection model inferring directly from plain text via tokens' context embedding. It can cover all possible spans and avoid limiting to mention-entity maps. In addition, information of both inarguable mention-entity pairs, and mention repeat has been introduced as token-wise representation enhancement by FLAT attention and repeat embedding respectively. Empirical results on WikiC show that, compared with often adopted and state-of-the-art Entity Linking and Entity Recognition methods, our method achieves improvement to previous methods in overall performance. Additional experiments show that DrWiki gains improvement even with a low-coverage mention-entity map.\",\"PeriodicalId\":106974,\"journal\":{\"name\":\"2022 International Joint Conference on Neural Networks (IJCNN)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Joint Conference on Neural Networks (IJCNN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN55064.2022.9892289\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN55064.2022.9892289","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-source Representation Enhancement for Wikipedia-style Entity Annotation
Entity annotation in Wikipedia (officially named wikilinks) greatly benefits human end-users. Human editors are required to select all mentions that are most helpful to human end-users and link each mention to a Wikipedia page. We aim to design an automatic system to generate Wikipedia-style entity annotation for any plain text. However, existing research either rely heavily on mention-entity map or are restricted to named entities only. Besides, they neglect to select the appropriate mentions as Wikipedia requires. As a result, they leave out some necessary annotation and introduce excessive distracting annotation. Existing benchmarks also skirt around the coverage and selection issues. We propose a new task called Mention Detection and Se-lection for entity annotation, along with a new benchmark, WikiC, to better reflect annotation quality. The task is coined centering mentions specific to each position in high-quality human-annotated examples. We also proposed a new framework, DrWiki, to fulfill the task. We adopt a deep pre-trained span selection model inferring directly from plain text via tokens' context embedding. It can cover all possible spans and avoid limiting to mention-entity maps. In addition, information of both inarguable mention-entity pairs, and mention repeat has been introduced as token-wise representation enhancement by FLAT attention and repeat embedding respectively. Empirical results on WikiC show that, compared with often adopted and state-of-the-art Entity Linking and Entity Recognition methods, our method achieves improvement to previous methods in overall performance. Additional experiments show that DrWiki gains improvement even with a low-coverage mention-entity map.