Text Classification based on DiscriminativeSemantic Features and Variance of Fuzzy Similarity

Pouyan Parsafard, H. Veisi, Niloofar Aflaki, Siamak Mirzaei
{"title":"Text Classification based on DiscriminativeSemantic Features and Variance of Fuzzy Similarity","authors":"Pouyan Parsafard, H. Veisi, Niloofar Aflaki, Siamak Mirzaei","doi":"10.5815/ijisa.2022.02.03","DOIUrl":null,"url":null,"abstract":"Due to the rapid growth of the Internet, large amounts of unlabelled textual data are producing daily. Clearly, finding the subject of a text document is a primary source of information in the text processing applications. In this paper, a text classification method is presented and evaluated for Persian and English. The proposed technique utilizes variance of fuzzy similarity besides discriminative and semantic feature selection methods. Discriminative features are those that distinguish categories with higher power and the concept of semantic feature takes into the calculations the similarity between features and documents by using only available documents. In the proposed method, incorporating fuzzy weighting as a measure of similarity is presented. The fuzzy weights are derived from the concept of fuzzy similarity which is defined as the variance of membership values of a document to all categories in the way that with some membership value at the same time, the sum of these membership values should be equal to 1. The proposed document classification method is evaluated on three datasets (one Persian and two English datasets) and two classification methods, support vector machine (SVM) and artificial neural network (ANN), are used. Comparing the results with other text classification methods, demonstrate the consistent superiority of the proposed technique in all cases. The weighted average F-measure of our method are %82 and %97.8 in the classification of Persian and English documents, respectively.","PeriodicalId":14067,"journal":{"name":"International Journal of Intelligent Systems and Applications in Engineering","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Systems and Applications in Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijisa.2022.02.03","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0

Abstract

Due to the rapid growth of the Internet, large amounts of unlabelled textual data are producing daily. Clearly, finding the subject of a text document is a primary source of information in the text processing applications. In this paper, a text classification method is presented and evaluated for Persian and English. The proposed technique utilizes variance of fuzzy similarity besides discriminative and semantic feature selection methods. Discriminative features are those that distinguish categories with higher power and the concept of semantic feature takes into the calculations the similarity between features and documents by using only available documents. In the proposed method, incorporating fuzzy weighting as a measure of similarity is presented. The fuzzy weights are derived from the concept of fuzzy similarity which is defined as the variance of membership values of a document to all categories in the way that with some membership value at the same time, the sum of these membership values should be equal to 1. The proposed document classification method is evaluated on three datasets (one Persian and two English datasets) and two classification methods, support vector machine (SVM) and artificial neural network (ANN), are used. Comparing the results with other text classification methods, demonstrate the consistent superiority of the proposed technique in all cases. The weighted average F-measure of our method are %82 and %97.8 in the classification of Persian and English documents, respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于区别语义特征和模糊相似度方差的文本分类
由于互联网的快速发展,每天都会产生大量的无标签文本数据。显然,查找文本文档的主题是文本处理应用程序中的主要信息来源。本文提出并评价了一种波斯语和英语文本分类方法。该技术除了利用判别和语义特征选择方法外,还利用了模糊相似度的方差。判别特征是指那些区分类别的能力较强的特征,语义特征的概念是只使用可用的文档来计算特征和文档之间的相似度。在该方法中,引入模糊加权作为相似性度量。模糊权重来源于模糊相似度的概念,模糊相似度定义为文档的隶属度值与所有类别的方差,即同时存在某些隶属度值时,这些隶属度值的总和应等于1。在三个数据集(一个波斯语数据集和两个英语数据集)上对所提出的文档分类方法进行了评估,并使用了支持向量机(SVM)和人工神经网络(ANN)两种分类方法。将结果与其他文本分类方法进行比较,证明了该方法在所有情况下都具有一致的优越性。该方法在波斯语和英语文档分类中的加权平均f值分别为%82和%97.8。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International Journal of Intelligent Systems and Applications in Engineering
International Journal of Intelligent Systems and Applications in Engineering Computer Science-Computer Graphics and Computer-Aided Design
CiteScore
1.30
自引率
0.00%
发文量
18
期刊最新文献
Predicting Automobile Stock Prices Index in the Tehran Stock Exchange Using Machine Learning Models A Hybrid Unsupervised Density-based Approach with Mutual Information for Text Outlier Detection Digital Control and Management of Water Supply Infrastructure Using Embedded Systems and Machine Learning Machine Learning for Weather Forecasting: XGBoost vs SVM vs Random Forest in Predicting Temperature for Visakhapatnam An Enhanced Approach to Recommend Data Structures and Algorithms Problems Using Content-based Filtering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1