Pouyan Parsafard, H. Veisi, Niloofar Aflaki, Siamak Mirzaei
{"title":"Text Classification based on DiscriminativeSemantic Features and Variance of Fuzzy Similarity","authors":"Pouyan Parsafard, H. Veisi, Niloofar Aflaki, Siamak Mirzaei","doi":"10.5815/ijisa.2022.02.03","DOIUrl":null,"url":null,"abstract":"Due to the rapid growth of the Internet, large amounts of unlabelled textual data are producing daily. Clearly, finding the subject of a text document is a primary source of information in the text processing applications. In this paper, a text classification method is presented and evaluated for Persian and English. The proposed technique utilizes variance of fuzzy similarity besides discriminative and semantic feature selection methods. Discriminative features are those that distinguish categories with higher power and the concept of semantic feature takes into the calculations the similarity between features and documents by using only available documents. In the proposed method, incorporating fuzzy weighting as a measure of similarity is presented. The fuzzy weights are derived from the concept of fuzzy similarity which is defined as the variance of membership values of a document to all categories in the way that with some membership value at the same time, the sum of these membership values should be equal to 1. The proposed document classification method is evaluated on three datasets (one Persian and two English datasets) and two classification methods, support vector machine (SVM) and artificial neural network (ANN), are used. Comparing the results with other text classification methods, demonstrate the consistent superiority of the proposed technique in all cases. The weighted average F-measure of our method are %82 and %97.8 in the classification of Persian and English documents, respectively.","PeriodicalId":14067,"journal":{"name":"International Journal of Intelligent Systems and Applications in Engineering","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Systems and Applications in Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijisa.2022.02.03","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0
Abstract
Due to the rapid growth of the Internet, large amounts of unlabelled textual data are producing daily. Clearly, finding the subject of a text document is a primary source of information in the text processing applications. In this paper, a text classification method is presented and evaluated for Persian and English. The proposed technique utilizes variance of fuzzy similarity besides discriminative and semantic feature selection methods. Discriminative features are those that distinguish categories with higher power and the concept of semantic feature takes into the calculations the similarity between features and documents by using only available documents. In the proposed method, incorporating fuzzy weighting as a measure of similarity is presented. The fuzzy weights are derived from the concept of fuzzy similarity which is defined as the variance of membership values of a document to all categories in the way that with some membership value at the same time, the sum of these membership values should be equal to 1. The proposed document classification method is evaluated on three datasets (one Persian and two English datasets) and two classification methods, support vector machine (SVM) and artificial neural network (ANN), are used. Comparing the results with other text classification methods, demonstrate the consistent superiority of the proposed technique in all cases. The weighted average F-measure of our method are %82 and %97.8 in the classification of Persian and English documents, respectively.