EPT:基于嵌入式提示调优的低资源命名实体识别数据增强

Pub Date : 2023-08-01 DOI:10.1051/wujns/2023284299

Hongfei Yu, Kunyu Ni, Rongkang Xu, Wenjun Yu, Yu Huang

{"title":"EPT:基于嵌入式提示调优的低资源命名实体识别数据增强","authors":"Hongfei Yu, Kunyu Ni, Rongkang Xu, Wenjun Yu, Yu Huang","doi":"10.1051/wujns/2023284299","DOIUrl":null,"url":null,"abstract":"Data augmentation methods are often used to address data scarcity in natural language processing (NLP). However, token-label misalignment, which refers to situations where tokens are matched with incorrect entity labels in the augmented sentences, hinders the data augmentation methods from achieving high scores in token-level tasks like named entity recognition (NER). In this paper, we propose embedded prompt tuning (EPT) as a novel data augmentation approach to low-resource NER. To address the problem of token-label misalignment, we implicitly embed NER labels as prompt into the hidden layer of pre-trained language model, and therefore entity tokens masked can be predicted by the finetuned EPT. Hence, EPT can generate high-quality and high-diverse data with various entities, which improves performance of NER. As datasets of cross-domain NER are available, we also explore NER domain adaption with EPT. The experimental results show that EPT achieves substantial improvement over the baseline methods on low-resource NER tasks.","PeriodicalId":56925,"journal":{"name":"","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EPT: Data Augmentation with Embedded Prompt Tuning for Low-Resource Named Entity Recognition\",\"authors\":\"Hongfei Yu, Kunyu Ni, Rongkang Xu, Wenjun Yu, Yu Huang\",\"doi\":\"10.1051/wujns/2023284299\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data augmentation methods are often used to address data scarcity in natural language processing (NLP). However, token-label misalignment, which refers to situations where tokens are matched with incorrect entity labels in the augmented sentences, hinders the data augmentation methods from achieving high scores in token-level tasks like named entity recognition (NER). In this paper, we propose embedded prompt tuning (EPT) as a novel data augmentation approach to low-resource NER. To address the problem of token-label misalignment, we implicitly embed NER labels as prompt into the hidden layer of pre-trained language model, and therefore entity tokens masked can be predicted by the finetuned EPT. Hence, EPT can generate high-quality and high-diverse data with various entities, which improves performance of NER. As datasets of cross-domain NER are available, we also explore NER domain adaption with EPT. The experimental results show that EPT achieves substantial improvement over the baseline methods on low-resource NER tasks.\",\"PeriodicalId\":56925,\"journal\":{\"name\":\"\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0,\"publicationDate\":\"2023-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://doi.org/10.1051/wujns/2023284299\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.1051/wujns/2023284299","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

数据扩充方法通常用于解决自然语言处理（NLP）中的数据稀缺问题。然而，标记-标签错位是指标记与增强句中不正确的实体标签匹配的情况，它阻碍了数据增强方法在命名实体识别（NER）等标记级任务中获得高分。在本文中，我们提出了嵌入式提示调优（EPT）作为一种新的低资源NER数据增强方法。为了解决标记-标签错位的问题，我们将NER标签作为提示隐式嵌入到预先训练的语言模型的隐藏层中，因此可以通过微调的EPT来预测被屏蔽的实体标记。因此，EPT可以与各种实体生成高质量、高多样性的数据，这提高了NER的性能。由于跨域NER的数据集是可用的，我们还探索了使用EPT的NER域自适应。实验结果表明，EPT在低资源NER任务上比基线方法有了显著的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

EPT: Data Augmentation with Embedded Prompt Tuning for Low-Resource Named Entity Recognition

Data augmentation methods are often used to address data scarcity in natural language processing (NLP). However, token-label misalignment, which refers to situations where tokens are matched with incorrect entity labels in the augmented sentences, hinders the data augmentation methods from achieving high scores in token-level tasks like named entity recognition (NER). In this paper, we propose embedded prompt tuning (EPT) as a novel data augmentation approach to low-resource NER. To address the problem of token-label misalignment, we implicitly embed NER labels as prompt into the hidden layer of pre-trained language model, and therefore entity tokens masked can be predicted by the finetuned EPT. Hence, EPT can generate high-quality and high-diverse data with various entities, which improves performance of NER. As datasets of cross-domain NER are available, we also explore NER domain adaption with EPT. The experimental results show that EPT achieves substantial improvement over the baseline methods on low-resource NER tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助