How to Enhance Chinese Word Segmentation Using Knowledge Graphs

Kunhui Lin, Wenyuan Du, Xiaoli Wang, Meihong Wang, Zixiang Yang
{"title":"How to Enhance Chinese Word Segmentation Using Knowledge Graphs","authors":"Kunhui Lin, Wenyuan Du, Xiaoli Wang, Meihong Wang, Zixiang Yang","doi":"10.1109/ICCSE.2018.8468759","DOIUrl":null,"url":null,"abstract":"Chinese word segmentation is a very important problem for Chinese information processing. Chinese word segmentation results are the basis for computers to understand natural language. However, unlike most Western languages, Chinese words do not have fixed symbols like white space as word segmentation marks. Moreover, Chinese has a very complex grammar, and the word segmentation criteria are varied with the contexts. Therefore, Chinese word segmentation is a very difficult task. Many existing works have proposed many algorithms to solve this problem. However, to our best knowledge, none of them could outperform all the other methods. In this paper, we develop a novel algorithm based on semantics and contexts. We propose a semantic-based word similarity measure using the concept hierarchy in knowledge graphs, and use this measure to prune the different results which are generated by several state-of-the-art Chinese word segmentation methods. The idea is to respectively compute the concept similarity of these words to other words in the text, and choose the word with the highest concept similarity score. To evaluate the effectiveness of the proposed approach, we conduct a series of experiment on two real datasets. The results show that our method outperforms all the state-of-the-art algorithms by filtering out wrong results and retaining correct ones.","PeriodicalId":228760,"journal":{"name":"2018 13th International Conference on Computer Science & Education (ICCSE)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 13th International Conference on Computer Science & Education (ICCSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSE.2018.8468759","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Chinese word segmentation is a very important problem for Chinese information processing. Chinese word segmentation results are the basis for computers to understand natural language. However, unlike most Western languages, Chinese words do not have fixed symbols like white space as word segmentation marks. Moreover, Chinese has a very complex grammar, and the word segmentation criteria are varied with the contexts. Therefore, Chinese word segmentation is a very difficult task. Many existing works have proposed many algorithms to solve this problem. However, to our best knowledge, none of them could outperform all the other methods. In this paper, we develop a novel algorithm based on semantics and contexts. We propose a semantic-based word similarity measure using the concept hierarchy in knowledge graphs, and use this measure to prune the different results which are generated by several state-of-the-art Chinese word segmentation methods. The idea is to respectively compute the concept similarity of these words to other words in the text, and choose the word with the highest concept similarity score. To evaluate the effectiveness of the proposed approach, we conduct a series of experiment on two real datasets. The results show that our method outperforms all the state-of-the-art algorithms by filtering out wrong results and retaining correct ones.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
如何利用知识图谱增强汉语分词
中文分词是中文信息处理中的一个重要问题。中文分词结果是计算机理解自然语言的基础。然而,与大多数西方语言不同的是,汉语单词没有像空白这样的固定符号作为分词标记。此外,汉语的语法非常复杂,分词标准因语境而异。因此,汉语分词是一项非常困难的任务。许多现有的工作已经提出了许多算法来解决这个问题。然而,据我们所知,没有一种方法能胜过所有其他方法。在本文中,我们开发了一种基于语义和上下文的新算法。本文提出了一种基于语义的词相似度度量方法,并利用该度量方法对几种最先进的汉语分词方法产生的不同结果进行了修剪。其思路是分别计算这些单词与文本中其他单词的概念相似度,并选择概念相似度得分最高的单词。为了评估所提出方法的有效性,我们在两个真实数据集上进行了一系列实验。结果表明,该方法通过过滤错误结果并保留正确结果,优于所有最先进的算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Use of Utility Based Interestingness Measures to Predict the Academic Performance of Technology Learners in Sri Lanka Online Virtual Experiment Teaching Platform for Database Technology and Application Experiences in Blended Learning Based on Blackboard in Hubei University of Education Intelligent Information Recommendation Method in Web-Based Argumentation Support System Design and Teaching Practice of Simulation Experiment of “ I ” Shaped Metamaterial *
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1