Error feedback based lexical entity extraction for Chinese language modeling

Yi Liu, Jing Hua, Xiangang Li, Xihong Wu
{"title":"Error feedback based lexical entity extraction for Chinese language modeling","authors":"Yi Liu, Jing Hua, Xiangang Li, Xihong Wu","doi":"10.1109/CISP.2013.6743873","DOIUrl":null,"url":null,"abstract":"Chinese, which is quite different from western languages, has no standard definition of word. Therefore, choosing suitable lexicon plays an important role in Chinese language modeling. This paper proposes a novel method of constructing the lexicon automatically. Other than depending on statistical measures of text features, this method is directly based on the feedback of errors from the corresponding task, such as phoneme-to-grapheme conversion in this paper. The whole process consists of two iterative phases: selection of individual words from a large manual lexicon and further extraction of compound words based on Phase One. Experiments implemented on phoneme-to-grapheme conversion show that this method can achieve 1.09% and 0.38% absolute reduction in character error rate respectively for Phase One and Phase Two compared with baseline lexicons in the same size generated by the conventional method based on word frequency.","PeriodicalId":442320,"journal":{"name":"2013 6th International Congress on Image and Signal Processing (CISP)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 6th International Congress on Image and Signal Processing (CISP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISP.2013.6743873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Chinese, which is quite different from western languages, has no standard definition of word. Therefore, choosing suitable lexicon plays an important role in Chinese language modeling. This paper proposes a novel method of constructing the lexicon automatically. Other than depending on statistical measures of text features, this method is directly based on the feedback of errors from the corresponding task, such as phoneme-to-grapheme conversion in this paper. The whole process consists of two iterative phases: selection of individual words from a large manual lexicon and further extraction of compound words based on Phase One. Experiments implemented on phoneme-to-grapheme conversion show that this method can achieve 1.09% and 0.38% absolute reduction in character error rate respectively for Phase One and Phase Two compared with baseline lexicons in the same size generated by the conventional method based on word frequency.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于错误反馈的汉语词汇实体抽取
汉语与西方语言有很大的不同,它没有标准的词的定义。因此,选择合适的词汇在汉语语言建模中起着重要的作用。本文提出了一种自动构建词典的新方法。该方法不依赖于文本特征的统计度量,而是直接基于相应任务的错误反馈,例如本文中的音素-字素转换。整个过程包括两个迭代阶段:从大型人工词典中选择单个单词和在阶段一的基础上进一步提取复合词。音素-字素转换实验表明,与基于词频的常规方法生成的相同大小的基线词汇相比,该方法在第一阶段和第二阶段的字符错误率分别降低了1.09%和0.38%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Dynamic risk assesment for driver response in passing over obstacles A novel image fusion rule based on Structure Similarity indices A double total variation regularized model of Retinex theory based on nonlocal differential operators An optimized weighted multi-frequency subspace migration for imaging perfectly conducting, arc-like cracks A randomized circle detection method with application to detection of circular traffic signs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1