Feature Selection with β-Hill Climbing Search for Text Clustering Application

L. Abualigah, A. Khader, M. Al-Betar, Zaid Abdi Alkareem Alyasseri, O. Alomari, Essam Said Hanandeh
{"title":"Feature Selection with β-Hill Climbing Search for Text Clustering Application","authors":"L. Abualigah, A. Khader, M. Al-Betar, Zaid Abdi Alkareem Alyasseri, O. Alomari, Essam Said Hanandeh","doi":"10.1109/PICICT.2017.30","DOIUrl":null,"url":null,"abstract":"In the bases of increasing the volume of text information, the dealing with text information has become incredibly complicated. The text clustering is a suitable technique used in dealing with a tremendous amount of text documents by classifying these set of text documents into clusters. Ultimately, text documents hold sparse, non-uniform distribution and uninformative features are difficult to cluster. The text feature selection is a primary unsupervised learning method that is utilized to choose a new subset of informational text features. In this paper, a new algorithm is proposed based on β-hill climbing technique for text feature selection problem to improve the text clustering (B-FSTC). The results of the proposed method for β-hill climbing and original Hill climbing (i.e., H-FSTC) are examined using the k-mean text clustering and compared with each other. Experiments were conducted on four standard text datasets with varying characteristics. Interestingly, the proposed β-hill climbing algorithm obtains superior results in comparison with the other well-regard techniques by producing a new subset of informational text features. Lastly, the β-hill climbing-based feature selection method supports the k-mean clustering algorithm to achieve more precise clusters.","PeriodicalId":259869,"journal":{"name":"2017 Palestinian International Conference on Information and Communication Technology (PICICT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"42","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Palestinian International Conference on Information and Communication Technology (PICICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PICICT.2017.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 42

Abstract

In the bases of increasing the volume of text information, the dealing with text information has become incredibly complicated. The text clustering is a suitable technique used in dealing with a tremendous amount of text documents by classifying these set of text documents into clusters. Ultimately, text documents hold sparse, non-uniform distribution and uninformative features are difficult to cluster. The text feature selection is a primary unsupervised learning method that is utilized to choose a new subset of informational text features. In this paper, a new algorithm is proposed based on β-hill climbing technique for text feature selection problem to improve the text clustering (B-FSTC). The results of the proposed method for β-hill climbing and original Hill climbing (i.e., H-FSTC) are examined using the k-mean text clustering and compared with each other. Experiments were conducted on four standard text datasets with varying characteristics. Interestingly, the proposed β-hill climbing algorithm obtains superior results in comparison with the other well-regard techniques by producing a new subset of informational text features. Lastly, the β-hill climbing-based feature selection method supports the k-mean clustering algorithm to achieve more precise clusters.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于β-爬山搜索的特征选择在文本聚类中的应用
在文本信息量不断增加的基础上,文本信息的处理变得异常复杂。文本聚类是一种适合用于处理大量文本文档的技术,它将这些文本文档集分类到簇中。最终,文本文档具有稀疏、不均匀分布和无信息的特征,难以聚类。文本特征选择是一种主要的无监督学习方法,用于选择新的信息文本特征子集。本文提出了一种基于β-爬坡技术的文本特征选择算法,以改进文本聚类算法。采用k-mean文本聚类对所提出的β-爬坡方法和原始爬坡方法(即H-FSTC)的结果进行检验,并相互比较。在四种不同特征的标准文本数据集上进行了实验。有趣的是,通过生成新的信息文本特征子集,所提出的β-爬坡算法与其他备受推崇的技术相比获得了更好的结果。最后,基于β-爬坡的特征选择方法支持k-mean聚类算法,实现更精确的聚类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Precision Agriculture for Greenhouses Using a Wireless Sensor Network A New Set of Features for Detecting Router Advertisement Flooding Attacks Automatic Arabic Text Summarization for Large Scale Multiple Documents Using Genetic Algorithm and MapReduce Review on Detection Techniques against DDoS Attacks on a Software-Defined Networking Controller Arabic Opinion Mining Using Distributed Representations of Documents
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1