中文命名实体识别中丢失词恢复的效用研究

Dunbo Cai, Zhiguo Huang, Ling Qian
{"title":"中文命名实体识别中丢失词恢复的效用研究","authors":"Dunbo Cai, Zhiguo Huang, Ling Qian","doi":"10.1145/3460179.3460189","DOIUrl":null,"url":null,"abstract":"Named entity recognition (NER) in natural language processing (NLP) considers the problem of identifying a sequence of words in a sentence text that mentions a predefined type of object (entity), e.g., person, organization, location, or time. NER methods are keys in extracting knowledge from texts as entities are fundamental for attaching entity properties or entity relations. However, NER for texts in Chinese is trickier due to that some auxiliary words maybe dropped in a sentence, which is a common phenomenon in Chinese writing for brevity. A usually dropped Chinese word is ‘的’ (often functions as the word ‘of’ in English). One obvious effect of this kind of omitting is bring difficulty in identifying the sub-entities (or nested named entities) contained in a named entity. Previous works considers the effected of recovering dropped pronouns in the Chinese translation task. Here we proposed a rule-based method to rover the auxiliary word ‘的’ for Chinese text, and study the effect of this recovery on the performance of a state-of-the-art Chinese NER method FLAT. Experimental results on Weibo-NER and MSRA-NER datasets shows that our method improves on FLAT. This study thus highlights the promising of recovering more types of dropped words for Chinese NER problem.","PeriodicalId":193744,"journal":{"name":"Proceedings of the 2021 6th International Conference on Intelligent Information Technology","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"On the Utility of Recovering Dropped Words in Chinese Named Entity Recognition\",\"authors\":\"Dunbo Cai, Zhiguo Huang, Ling Qian\",\"doi\":\"10.1145/3460179.3460189\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Named entity recognition (NER) in natural language processing (NLP) considers the problem of identifying a sequence of words in a sentence text that mentions a predefined type of object (entity), e.g., person, organization, location, or time. NER methods are keys in extracting knowledge from texts as entities are fundamental for attaching entity properties or entity relations. However, NER for texts in Chinese is trickier due to that some auxiliary words maybe dropped in a sentence, which is a common phenomenon in Chinese writing for brevity. A usually dropped Chinese word is ‘的’ (often functions as the word ‘of’ in English). One obvious effect of this kind of omitting is bring difficulty in identifying the sub-entities (or nested named entities) contained in a named entity. Previous works considers the effected of recovering dropped pronouns in the Chinese translation task. Here we proposed a rule-based method to rover the auxiliary word ‘的’ for Chinese text, and study the effect of this recovery on the performance of a state-of-the-art Chinese NER method FLAT. Experimental results on Weibo-NER and MSRA-NER datasets shows that our method improves on FLAT. This study thus highlights the promising of recovering more types of dropped words for Chinese NER problem.\",\"PeriodicalId\":193744,\"journal\":{\"name\":\"Proceedings of the 2021 6th International Conference on Intelligent Information Technology\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 6th International Conference on Intelligent Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3460179.3460189\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 6th International Conference on Intelligent Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3460179.3460189","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

自然语言处理(NLP)中的命名实体识别(NER)考虑的问题是识别句子文本中提到预定义类型的对象(实体)的单词序列,例如人、组织、位置或时间。由于实体是附加实体属性或实体关系的基础,NER方法是从文本中提取知识的关键。然而,汉语文本的NER比较棘手,因为句子中可能会省略一些助词,这是汉语写作中为了简洁而常见的现象。中文中经常省略的一个词是“。”(在英语中通常是“of”的意思)。这种省略的一个明显影响是难以识别包含在命名实体中的子实体(或嵌套命名实体)。以往的研究都探讨了在汉语翻译任务中对丢失代词的恢复所起的作用。在此,我们提出了一种基于规则的方法对中文文本的助词“。”进行搜索,并研究了这种搜索对最先进的中文NER方法FLAT性能的影响。在微博- ner和MSRA-NER数据集上的实验结果表明,我们的方法在FLAT上得到了改进。因此,本研究强调了在汉语NER问题中恢复更多类型的丢失词的前景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
On the Utility of Recovering Dropped Words in Chinese Named Entity Recognition
Named entity recognition (NER) in natural language processing (NLP) considers the problem of identifying a sequence of words in a sentence text that mentions a predefined type of object (entity), e.g., person, organization, location, or time. NER methods are keys in extracting knowledge from texts as entities are fundamental for attaching entity properties or entity relations. However, NER for texts in Chinese is trickier due to that some auxiliary words maybe dropped in a sentence, which is a common phenomenon in Chinese writing for brevity. A usually dropped Chinese word is ‘的’ (often functions as the word ‘of’ in English). One obvious effect of this kind of omitting is bring difficulty in identifying the sub-entities (or nested named entities) contained in a named entity. Previous works considers the effected of recovering dropped pronouns in the Chinese translation task. Here we proposed a rule-based method to rover the auxiliary word ‘的’ for Chinese text, and study the effect of this recovery on the performance of a state-of-the-art Chinese NER method FLAT. Experimental results on Weibo-NER and MSRA-NER datasets shows that our method improves on FLAT. This study thus highlights the promising of recovering more types of dropped words for Chinese NER problem.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Personalised Recommendation Framework for Ubiquitous Learning System Research on Data Query Optimization based on Genetic Algorithm Resource Utilization Optimization using Genetic Algorithm based on Variation of Resource Fluctuation Moment for Extra-Large Building Renovation The Influence of Product Recommendation Methods on Users' Purchase Intention Analysis of Best Sampling Strategy in Credit Card Fraud Detection Using Machine Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1