Named Entity Recognition of Zhuang Language Based on the Feature of Initial Letter in Word

Weiquan Zhang, Suqin Tang, Danni He, Tinghui Li, Changchun Pan
{"title":"Named Entity Recognition of Zhuang Language Based on the Feature of Initial Letter in Word","authors":"Weiquan Zhang, Suqin Tang, Danni He, Tinghui Li, Changchun Pan","doi":"10.1145/3529466.3529478","DOIUrl":null,"url":null,"abstract":"Named entity recognition is an important task and basis for the intelligent information processing and knowledge representation learning of the Zhuang Language. A BilSTM-CNN-CRF network model combining the uppercase and lowercase characters of words is proposed to be applied to the named entity recognition task of the Zhuang language, which lacks corpus for named entity labeling. Firstly, word2vec is used to train in unmarked Zhuang text to get the word vector of the Zhuang language. Then convolutional neural network is used to extract the character features of Zhuang words, and the character feature vector is obtained. The above two vectors were connected with the initial case feature vectors, which are randomly generated, and then the connected vectors were input into a BilSTM-CNN-CRF model for training; thus, the end-to-end named entity recognition model of Zhuang language was constructed. Experimental results show that, without relying on artificial features and external dictionaries, the proposed method in this study is superior to contrastive models by achieving an 80.37% F1 value in the named entity recognition task, which leads to the realization of automated named entity recognition of Zhuang language.","PeriodicalId":375562,"journal":{"name":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 6th International Conference on Innovation in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529466.3529478","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Named entity recognition is an important task and basis for the intelligent information processing and knowledge representation learning of the Zhuang Language. A BilSTM-CNN-CRF network model combining the uppercase and lowercase characters of words is proposed to be applied to the named entity recognition task of the Zhuang language, which lacks corpus for named entity labeling. Firstly, word2vec is used to train in unmarked Zhuang text to get the word vector of the Zhuang language. Then convolutional neural network is used to extract the character features of Zhuang words, and the character feature vector is obtained. The above two vectors were connected with the initial case feature vectors, which are randomly generated, and then the connected vectors were input into a BilSTM-CNN-CRF model for training; thus, the end-to-end named entity recognition model of Zhuang language was constructed. Experimental results show that, without relying on artificial features and external dictionaries, the proposed method in this study is superior to contrastive models by achieving an 80.37% F1 value in the named entity recognition task, which leads to the realization of automated named entity recognition of Zhuang language.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于词首字母特征的壮语命名实体识别
命名实体识别是壮语智能信息处理和知识表示学习的重要任务和基础。提出了一种结合单词大小写字符的BilSTM-CNN-CRF网络模型,并将其应用于缺少命名实体标注语料库的壮语命名实体识别任务。首先,使用word2vec对未标记的壮语文本进行训练,得到壮语的词向量;然后利用卷积神经网络提取壮词的特征,得到壮词的特征向量;将上述两个向量与随机生成的初始案例特征向量连接,然后将连接的向量输入到BilSTM-CNN-CRF模型中进行训练;从而构建了壮语端到端的命名实体识别模型。实验结果表明,在不依赖人工特征和外部词典的情况下,本文提出的方法在命名实体识别任务中达到了80.37%的F1值,优于对比模型,实现了庄语命名实体的自动识别。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
DGIC: A Distributed Graph Inference Computing Framework Suitable For Encoder-Decoder GNN Transformer-based Question Text Generation in the Learning System ECA-CBAM: Classification of Diabetic Retinopathy: Classification of diabetic retinopathy by cross-combined attention mechanism Speech Emotion Recognition Exploiting ASR-based and Phonological Knowledge Representations Heterogeneous Collaborative Refining for Real-Time End-to-End Image-Text Retrieval System
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1