Artificial bee colony algorithm for feature selection and improved support vector machine for text classification

IF 2.1 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE Information Discovery and Delivery Pub Date : 2019-08-19 DOI:10.1108/IDD-09-2018-0045
J. Balakumar, S. Mohan
{"title":"Artificial bee colony algorithm for feature selection and improved support vector machine for text classification","authors":"J. Balakumar, S. Mohan","doi":"10.1108/IDD-09-2018-0045","DOIUrl":null,"url":null,"abstract":"\nPurpose\nOwing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content.\n\n\nDesign/methodology/approach\nThis paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper.\n\n\nFindings\nThe experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy.\n\n\nOriginality/value\nThis paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.\n","PeriodicalId":43488,"journal":{"name":"Information Discovery and Delivery","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2019-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1108/IDD-09-2018-0045","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Discovery and Delivery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/IDD-09-2018-0045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 14

Abstract

Purpose Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content. Design/methodology/approach This paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper. Findings The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy. Originality/value This paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
人工蜂群算法用于特征选择和改进的支持向量机用于文本分类
目的由于互联网上的海量文档,文本分类成为处理这些文档的必要任务。为了获得最佳的文本分类结果,特征选择是文本分类的一个重要阶段,通过选择合适的特征来降低文本文档的维数。本研究的主要目的是对个人电脑文档进行内容分类。为了提高文本分类的准确率,本文提出了一种基于人工蜂群(ABCFS)的特征选择算法。该算法与现有的信息增益和χ2统计等特征选择方法不同,采用真实数据集和基准数据集对算法进行了检验。为了验证该算法的有效性,本文采用了支持向量机(SVM)和改进的SVM分类器。实验结果在真实数据集和基准数据集上进行。真实数据集以文档的形式收集,存储在个人计算机中,基准数据集收集自Reuters和20 Newsgroups语料库。实验结果证明了所提特征选择算法的有效性,提高了文本文档的分类准确率。提出了一种新的ABCFS特征选择算法,对ABCFS算法的效率进行了评价,并对支持向量机进行了改进。本文采用ABCFS算法从文本(非结构化)文档中选择特征。虽然现有工作中没有文本特征选择算法,但使用ABCFS算法来选择数据(结构化)特征。该算法将根据内容自动对文档进行分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Information Discovery and Delivery
Information Discovery and Delivery INFORMATION SCIENCE & LIBRARY SCIENCE-
CiteScore
5.40
自引率
4.80%
发文量
21
期刊介绍: Information Discovery and Delivery covers information discovery and access for digital information researchers. This includes educators, knowledge professionals in education and cultural organisations, knowledge managers in media, health care and government, as well as librarians. The journal publishes research and practice which explores the digital information supply chain ie transport, flows, tracking, exchange and sharing, including within and between libraries. It is also interested in digital information capture, packaging and storage by ‘collectors’ of all kinds. Information is widely defined, including but not limited to: Records, Documents, Learning objects, Visual and sound files, Data and metadata and , User-generated content.
期刊最新文献
Visualizing the evolution of touchscreen research by scientometric analysis Analyzing user sentiments toward selected content management software: a sentiment analysis of viewer’s comments on YouTube Usability testing of a website through different devices: a task-based approach in a public university setting in Bangladesh Exploring Information Systems (IS) curricula: a semantic analysis approach Examines the value of cloud computing adoption as a proxy for IT flexibility and effectiveness
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1