基于相邻熵识别算法的中国英语网络新词库智能构建研究

Y. Zu
{"title":"基于相邻熵识别算法的中国英语网络新词库智能构建研究","authors":"Y. Zu","doi":"10.1109/ECICE55674.2022.10042904","DOIUrl":null,"url":null,"abstract":"With the development and popularization of the network, the network plays an important role in people’s daily life, work, and the dissemination of social hot information. Most of the online Chinese English neologisms are widely disseminated through the Internet platform and are known and used by people. New word recognition plays an important role in Chinese word segmentation and information retrieval. With the emergence of a large number of new words in China English, the lack of a China English neologism database has become a major obstacle to the study of China English. New word recognition is a major technical issue in building a corpus. To solve the problem that the internal cohesion of words in the existing point mutual information and low adjacency entropy of new word recognition algorithms, a new word recognition algorithm of China English is proposed. The algorithm also solves the problem of the single threshold setting of point mutual information of the threshold invalid phrases, and the low threshold of new word groups by using point mutual information to identify new words. The experimental results show that, under the same data and experimental environment, the method improves the accuracy, recall, and F values, which is effective and feasible for corpus construction.","PeriodicalId":282635,"journal":{"name":"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on Intelligent Construction of China English Network New Words Database Based on Adjacent Entropy Recognition Algorithm\",\"authors\":\"Y. Zu\",\"doi\":\"10.1109/ECICE55674.2022.10042904\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the development and popularization of the network, the network plays an important role in people’s daily life, work, and the dissemination of social hot information. Most of the online Chinese English neologisms are widely disseminated through the Internet platform and are known and used by people. New word recognition plays an important role in Chinese word segmentation and information retrieval. With the emergence of a large number of new words in China English, the lack of a China English neologism database has become a major obstacle to the study of China English. New word recognition is a major technical issue in building a corpus. To solve the problem that the internal cohesion of words in the existing point mutual information and low adjacency entropy of new word recognition algorithms, a new word recognition algorithm of China English is proposed. The algorithm also solves the problem of the single threshold setting of point mutual information of the threshold invalid phrases, and the low threshold of new word groups by using point mutual information to identify new words. The experimental results show that, under the same data and experimental environment, the method improves the accuracy, recall, and F values, which is effective and feasible for corpus construction.\",\"PeriodicalId\":282635,\"journal\":{\"name\":\"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECICE55674.2022.10042904\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 4th Eurasia Conference on IOT, Communication and Engineering (ECICE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECICE55674.2022.10042904","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

随着网络的发展和普及,网络在人们的日常生活、工作以及社会热点信息的传播中发挥着重要的作用。网络汉英新词大多通过网络平台广泛传播,为人们所熟知和使用。新词识别在汉语分词和信息检索中起着重要的作用。随着中国英语中大量新词的出现,缺乏中国英语新词数据库已成为中国英语研究的一大障碍。新词识别是语料库建设中的一个重要技术问题。针对现有的点互信息中单词的内聚性和新单词识别算法邻接熵低的问题,提出了一种新的中国英语单词识别算法。该算法还解决了阈值无效短语点互信息阈值设置单一的问题,以及利用点互信息识别新词的新词组阈值较低的问题。实验结果表明,在相同的数据和实验环境下,该方法提高了语料库构建的正确率、查全率和F值,是有效可行的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Research on Intelligent Construction of China English Network New Words Database Based on Adjacent Entropy Recognition Algorithm
With the development and popularization of the network, the network plays an important role in people’s daily life, work, and the dissemination of social hot information. Most of the online Chinese English neologisms are widely disseminated through the Internet platform and are known and used by people. New word recognition plays an important role in Chinese word segmentation and information retrieval. With the emergence of a large number of new words in China English, the lack of a China English neologism database has become a major obstacle to the study of China English. New word recognition is a major technical issue in building a corpus. To solve the problem that the internal cohesion of words in the existing point mutual information and low adjacency entropy of new word recognition algorithms, a new word recognition algorithm of China English is proposed. The algorithm also solves the problem of the single threshold setting of point mutual information of the threshold invalid phrases, and the low threshold of new word groups by using point mutual information to identify new words. The experimental results show that, under the same data and experimental environment, the method improves the accuracy, recall, and F values, which is effective and feasible for corpus construction.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
License Plate Recognition Model For Tilt Correction Based on Convolutional Neural Network Quaternion Singular Spectrum Analysis of Pupillary Dynamics for Health Monitoring Trajectory Tracking Control of Autonomous Lawn Mower Based on ANSMC Task Scheduling with Makespan Minimization for Distributed Machine Learning Ensembles Socially Assistive Robots Assisting Older Adults in an Internet and Smart Healthcare Era: A Literature Review
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1