基于结构和语义相似度的词干提取

Mohammad Hassan Dianati, M. Sadreddini, A. Rasekh, S. M. Fakhrahmad, H. Taghi-Zadeh
{"title":"基于结构和语义相似度的词干提取","authors":"Mohammad Hassan Dianati, M. Sadreddini, A. Rasekh, S. M. Fakhrahmad, H. Taghi-Zadeh","doi":"10.18495/COMENGAPP.V3I2.57","DOIUrl":null,"url":null,"abstract":"Words  stemming  is  one  of  the  important  issues  in  the field  of  natural  language processing  and  information retrieval.  There  are  different  methods  for stemming which are mostly language-dependent. Therefore, these  stemmers are only applicable  to  particular  languages.  Because  of the importance  of  this issue,  in  this paper, the proposed method for stemming is aimed to be language-independent. In the  proposed  stemmer,  a  bilingual  dictionary  is  used and  all  of  the  words  in  the dictionary are firstly clustered. The words’ clustering is based on their structural and semantic similarity. Finally, finding the stem of new coming words is performed by making use of the previously formatted clusters. To evaluate the proposed scheme, words  stemming is  done on both  Persian  and  English  languages.  The encouraging results  indicate  the  good  performance  of  the proposed  method  compared  with  its counterparts.","PeriodicalId":120500,"journal":{"name":"Computer Engineering and Applications","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Words Stemming Based on Structural and Semantic Similarity\",\"authors\":\"Mohammad Hassan Dianati, M. Sadreddini, A. Rasekh, S. M. Fakhrahmad, H. Taghi-Zadeh\",\"doi\":\"10.18495/COMENGAPP.V3I2.57\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Words  stemming  is  one  of  the  important  issues  in  the field  of  natural  language processing  and  information retrieval.  There  are  different  methods  for stemming which are mostly language-dependent. Therefore, these  stemmers are only applicable  to  particular  languages.  Because  of the importance  of  this issue,  in  this paper, the proposed method for stemming is aimed to be language-independent. In the  proposed  stemmer,  a  bilingual  dictionary  is  used and  all  of  the  words  in  the dictionary are firstly clustered. The words’ clustering is based on their structural and semantic similarity. Finally, finding the stem of new coming words is performed by making use of the previously formatted clusters. To evaluate the proposed scheme, words  stemming is  done on both  Persian  and  English  languages.  The encouraging results  indicate  the  good  performance  of  the proposed  method  compared  with  its counterparts.\",\"PeriodicalId\":120500,\"journal\":{\"name\":\"Computer Engineering and Applications\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Engineering and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18495/COMENGAPP.V3I2.57\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Engineering and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18495/COMENGAPP.V3I2.57","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

词的词干提取是自然语言处理和信息检索领域的重要问题之一。词干提取有不同的方法,大多依赖于语言。因此,这些词干只适用于特定的语言。鉴于这一问题的重要性,本文提出的词干提取方法旨在实现与语言无关的词干提取方法。在该方法中,首先使用双语词典对词典中的所有单词进行聚类。词的聚类是基于它们的结构和语义相似性。最后,通过使用先前格式化的聚类来查找新单词的词干。为了评估所提出的方案,对波斯语和英语进行了单词词干提取。结果表明,与同类方法相比,该方法具有良好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Words Stemming Based on Structural and Semantic Similarity
Words  stemming  is  one  of  the  important  issues  in  the field  of  natural  language processing  and  information retrieval.  There  are  different  methods  for stemming which are mostly language-dependent. Therefore, these  stemmers are only applicable  to  particular  languages.  Because  of the importance  of  this issue,  in  this paper, the proposed method for stemming is aimed to be language-independent. In the  proposed  stemmer,  a  bilingual  dictionary  is  used and  all  of  the  words  in  the dictionary are firstly clustered. The words’ clustering is based on their structural and semantic similarity. Finally, finding the stem of new coming words is performed by making use of the previously formatted clusters. To evaluate the proposed scheme, words  stemming is  done on both  Persian  and  English  languages.  The encouraging results  indicate  the  good  performance  of  the proposed  method  compared  with  its counterparts.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Fuzzy Logic-Ant Colony Optimization for Explorer-Follower Robot with Global Optimal Path Planning BLOB Analysis for Fruit Recognition and Detection Some Physical and Computational Features of Unloaded Power Transmission Lines' Switching-off Process A new method to improve feature selection with meta-heuristic algorithm and chaos theory Implementation Color Filtering and Harris Corner Method on Pattern Recognition System
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1