Dimension reduction based on categorical fuzzy correlation degree for document categorization

Qiang Li, Liang He, Xin Lin
{"title":"Dimension reduction based on categorical fuzzy correlation degree for document categorization","authors":"Qiang Li, Liang He, Xin Lin","doi":"10.1109/GrC.2013.6740405","DOIUrl":null,"url":null,"abstract":"High dimensionality of the feature space is a common problem in document categorization. Most of the features obtained through conventional feature selection algorithms such as IG are relevant and redundant. In this paper, a two-step feature selection method is proposed. At the first step redundancy analysis among original features based on categorical fuzzy correlation degree is applied to filter the redundant features with the similar categorical term frequency distribution. In the second step, conventional IG feature selection algorithm is adopted to select the final feature set for document categorization. Experiments dealing with the well-known Reuters-21578 and 20news-18828 corpuses show that the proposed method can eliminate redundant features with high fuzzy correlation degree between each other and obtain a compressed feature space where the dimension of feature space is dramatically reduced. The document categorization results on two corpuses show that the conventional IG feature selection algorithm can achieve a better document categorization performance on the compressed feature space and demonstrate the effectiveness of the proposed method.","PeriodicalId":415445,"journal":{"name":"2013 IEEE International Conference on Granular Computing (GrC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Granular Computing (GrC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GrC.2013.6740405","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

High dimensionality of the feature space is a common problem in document categorization. Most of the features obtained through conventional feature selection algorithms such as IG are relevant and redundant. In this paper, a two-step feature selection method is proposed. At the first step redundancy analysis among original features based on categorical fuzzy correlation degree is applied to filter the redundant features with the similar categorical term frequency distribution. In the second step, conventional IG feature selection algorithm is adopted to select the final feature set for document categorization. Experiments dealing with the well-known Reuters-21578 and 20news-18828 corpuses show that the proposed method can eliminate redundant features with high fuzzy correlation degree between each other and obtain a compressed feature space where the dimension of feature space is dramatically reduced. The document categorization results on two corpuses show that the conventional IG feature selection algorithm can achieve a better document categorization performance on the compressed feature space and demonstrate the effectiveness of the proposed method.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于分类模糊关联度的降维文档分类
特征空间的高维是文档分类中常见的问题。传统的特征选择算法(如IG)得到的特征大部分是相关的、冗余的。本文提出了一种两步特征选择方法。首先采用基于分类模糊关联度的原始特征之间的冗余分析,过滤出具有相似分类项频率分布的冗余特征;第二步,采用常规IG特征选择算法,选择最终的特征集进行文档分类。对著名的Reuters-21578和20news-18828语料库进行的实验表明,该方法可以消除彼此之间模糊相关度较高的冗余特征,得到压缩的特征空间,特征空间的维数显著降低。在两个语料库上的文档分类结果表明,传统的IG特征选择算法在压缩的特征空间上能够取得较好的文档分类性能,验证了本文方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An adaptive group recommender based on overlapping community detection An ad-hoc clustering algorithm based on ant colony algorithm Clothes style recommendation system Predicting movie sales revenue using online reviews Dimension reduction based on categorical fuzzy correlation degree for document categorization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1