Mohammad Hassan Dianati, M. Sadreddini, A. Rasekh, S. M. Fakhrahmad, H. Taghi-Zadeh
{"title":"基于结构和语义相似度的词干提取","authors":"Mohammad Hassan Dianati, M. Sadreddini, A. Rasekh, S. M. Fakhrahmad, H. Taghi-Zadeh","doi":"10.18495/COMENGAPP.V3I2.57","DOIUrl":null,"url":null,"abstract":"Words stemming is one of the important issues in the field of natural language processing and information retrieval. There are different methods for stemming which are mostly language-dependent. Therefore, these stemmers are only applicable to particular languages. Because of the importance of this issue, in this paper, the proposed method for stemming is aimed to be language-independent. In the proposed stemmer, a bilingual dictionary is used and all of the words in the dictionary are firstly clustered. The words’ clustering is based on their structural and semantic similarity. Finally, finding the stem of new coming words is performed by making use of the previously formatted clusters. To evaluate the proposed scheme, words stemming is done on both Persian and English languages. The encouraging results indicate the good performance of the proposed method compared with its counterparts.","PeriodicalId":120500,"journal":{"name":"Computer Engineering and Applications","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Words Stemming Based on Structural and Semantic Similarity\",\"authors\":\"Mohammad Hassan Dianati, M. Sadreddini, A. Rasekh, S. M. Fakhrahmad, H. Taghi-Zadeh\",\"doi\":\"10.18495/COMENGAPP.V3I2.57\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Words stemming is one of the important issues in the field of natural language processing and information retrieval. There are different methods for stemming which are mostly language-dependent. Therefore, these stemmers are only applicable to particular languages. Because of the importance of this issue, in this paper, the proposed method for stemming is aimed to be language-independent. In the proposed stemmer, a bilingual dictionary is used and all of the words in the dictionary are firstly clustered. The words’ clustering is based on their structural and semantic similarity. Finally, finding the stem of new coming words is performed by making use of the previously formatted clusters. To evaluate the proposed scheme, words stemming is done on both Persian and English languages. The encouraging results indicate the good performance of the proposed method compared with its counterparts.\",\"PeriodicalId\":120500,\"journal\":{\"name\":\"Computer Engineering and Applications\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Engineering and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18495/COMENGAPP.V3I2.57\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Engineering and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18495/COMENGAPP.V3I2.57","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Words Stemming Based on Structural and Semantic Similarity
Words stemming is one of the important issues in the field of natural language processing and information retrieval. There are different methods for stemming which are mostly language-dependent. Therefore, these stemmers are only applicable to particular languages. Because of the importance of this issue, in this paper, the proposed method for stemming is aimed to be language-independent. In the proposed stemmer, a bilingual dictionary is used and all of the words in the dictionary are firstly clustered. The words’ clustering is based on their structural and semantic similarity. Finally, finding the stem of new coming words is performed by making use of the previously formatted clusters. To evaluate the proposed scheme, words stemming is done on both Persian and English languages. The encouraging results indicate the good performance of the proposed method compared with its counterparts.