EPT:基于嵌入式提示调优的低资源命名实体识别数据增强

Hongfei Yu, Kunyu Ni, Rongkang Xu, Wenjun Yu, Yu Huang
{"title":"EPT:基于嵌入式提示调优的低资源命名实体识别数据增强","authors":"Hongfei Yu, Kunyu Ni, Rongkang Xu, Wenjun Yu, Yu Huang","doi":"10.1051/wujns/2023284299","DOIUrl":null,"url":null,"abstract":"Data augmentation methods are often used to address data scarcity in natural language processing (NLP). However, token-label misalignment, which refers to situations where tokens are matched with incorrect entity labels in the augmented sentences, hinders the data augmentation methods from achieving high scores in token-level tasks like named entity recognition (NER). In this paper, we propose embedded prompt tuning (EPT) as a novel data augmentation approach to low-resource NER. To address the problem of token-label misalignment, we implicitly embed NER labels as prompt into the hidden layer of pre-trained language model, and therefore entity tokens masked can be predicted by the finetuned EPT. Hence, EPT can generate high-quality and high-diverse data with various entities, which improves performance of NER. As datasets of cross-domain NER are available, we also explore NER domain adaption with EPT. The experimental results show that EPT achieves substantial improvement over the baseline methods on low-resource NER tasks.","PeriodicalId":23976,"journal":{"name":"Wuhan University Journal of Natural Sciences","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EPT: Data Augmentation with Embedded Prompt Tuning for Low-Resource Named Entity Recognition\",\"authors\":\"Hongfei Yu, Kunyu Ni, Rongkang Xu, Wenjun Yu, Yu Huang\",\"doi\":\"10.1051/wujns/2023284299\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data augmentation methods are often used to address data scarcity in natural language processing (NLP). However, token-label misalignment, which refers to situations where tokens are matched with incorrect entity labels in the augmented sentences, hinders the data augmentation methods from achieving high scores in token-level tasks like named entity recognition (NER). In this paper, we propose embedded prompt tuning (EPT) as a novel data augmentation approach to low-resource NER. To address the problem of token-label misalignment, we implicitly embed NER labels as prompt into the hidden layer of pre-trained language model, and therefore entity tokens masked can be predicted by the finetuned EPT. Hence, EPT can generate high-quality and high-diverse data with various entities, which improves performance of NER. As datasets of cross-domain NER are available, we also explore NER domain adaption with EPT. The experimental results show that EPT achieves substantial improvement over the baseline methods on low-resource NER tasks.\",\"PeriodicalId\":23976,\"journal\":{\"name\":\"Wuhan University Journal of Natural Sciences\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Wuhan University Journal of Natural Sciences\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://doi.org/10.1051/wujns/2023284299\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Multidisciplinary\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Wuhan University Journal of Natural Sciences","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.1051/wujns/2023284299","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Multidisciplinary","Score":null,"Total":0}
引用次数: 0

摘要

数据扩充方法通常用于解决自然语言处理(NLP)中的数据稀缺问题。然而,标记-标签错位是指标记与增强句中不正确的实体标签匹配的情况,它阻碍了数据增强方法在命名实体识别(NER)等标记级任务中获得高分。在本文中,我们提出了嵌入式提示调优(EPT)作为一种新的低资源NER数据增强方法。为了解决标记-标签错位的问题,我们将NER标签作为提示隐式嵌入到预先训练的语言模型的隐藏层中,因此可以通过微调的EPT来预测被屏蔽的实体标记。因此,EPT可以与各种实体生成高质量、高多样性的数据,这提高了NER的性能。由于跨域NER的数据集是可用的,我们还探索了使用EPT的NER域自适应。实验结果表明,EPT在低资源NER任务上比基线方法有了显著的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
EPT: Data Augmentation with Embedded Prompt Tuning for Low-Resource Named Entity Recognition
Data augmentation methods are often used to address data scarcity in natural language processing (NLP). However, token-label misalignment, which refers to situations where tokens are matched with incorrect entity labels in the augmented sentences, hinders the data augmentation methods from achieving high scores in token-level tasks like named entity recognition (NER). In this paper, we propose embedded prompt tuning (EPT) as a novel data augmentation approach to low-resource NER. To address the problem of token-label misalignment, we implicitly embed NER labels as prompt into the hidden layer of pre-trained language model, and therefore entity tokens masked can be predicted by the finetuned EPT. Hence, EPT can generate high-quality and high-diverse data with various entities, which improves performance of NER. As datasets of cross-domain NER are available, we also explore NER domain adaption with EPT. The experimental results show that EPT achieves substantial improvement over the baseline methods on low-resource NER tasks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Wuhan University Journal of Natural Sciences
Wuhan University Journal of Natural Sciences Multidisciplinary-Multidisciplinary
CiteScore
0.40
自引率
0.00%
发文量
2485
期刊介绍: Wuhan University Journal of Natural Sciences aims to promote rapid communication and exchange between the World and Wuhan University, as well as other Chinese universities and academic institutions. It mainly reflects the latest advances being made in many disciplines of scientific research in Chinese universities and academic institutions. The journal also publishes papers presented at conferences in China and abroad. The multi-disciplinary nature of Wuhan University Journal of Natural Sciences is apparent in the wide range of articles from leading Chinese scholars. This journal also aims to introduce Chinese academic achievements to the world community, by demonstrating the significance of Chinese scientific investigations.
期刊最新文献
Comprehensive Analysis of the Role of Forkhead Box J3 (FOXJ3) in Human Cancers Three New Classes of Subsystem Codes A Note of the Interpolating Sequence in Qp∩H∞ Learning Label Correlations for Multi-Label Online Passive Aggressive Classification Algorithm Uniform Asymptotics for Finite-Time Ruin Probabilities of Risk Models with Non-Stationary Arrivals and Strongly Subexponential Claim Sizes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1