Optimizing Sequential Forward Selection on Classification using Genetic Algorithm

IF 3.3 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Informatica Pub Date : 2023-11-08 DOI:10.31449/inf.v46i9.4964
Knitchepon Chotchantarakun
{"title":"Optimizing Sequential Forward Selection on Classification using Genetic Algorithm","authors":"Knitchepon Chotchantarakun","doi":"10.31449/inf.v46i9.4964","DOIUrl":null,"url":null,"abstract":"Regarding the digital transformation of modern technologies, the amount of data increases significantly resulting in novel knowledge discovery techniques in Data Analytic and Data Mining. These data usually consist of noises or non-informative features which affect the analysis results. The features-eliminating approaches have been studied extensively in the past few decades name feature selection. It is a significant preprocessing step of the mining process, which selects only the informative features from the original feature set. These selected features improve the learning model efficiency. This study proposes a forward sequential feature selection method called Forward Selection with Genetic Algorithm (FS-GA). FS-GA consists of three major steps. First, it creates the preliminarily selected subsets. Second, it provides an improvement on the previous subsets. Third, it optimizes the selected subset using the genetic algorithm. Hence, it maximizes the classification accuracy during the feature addition. We performed experiments based on ten standard UCI datasets using three popular classification models including the Decision Tree, Naive Bayes, and K-Nearest Neighbour classifiers. The results are compared with the state-of-the-art methods. FS-GA has shown the best results against the other sequential forward selection methods for all the tested datasets with O(n 2 ) time complexity.","PeriodicalId":56292,"journal":{"name":"Informatica","volume":"51 4","pages":"0"},"PeriodicalIF":3.3000,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31449/inf.v46i9.4964","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Regarding the digital transformation of modern technologies, the amount of data increases significantly resulting in novel knowledge discovery techniques in Data Analytic and Data Mining. These data usually consist of noises or non-informative features which affect the analysis results. The features-eliminating approaches have been studied extensively in the past few decades name feature selection. It is a significant preprocessing step of the mining process, which selects only the informative features from the original feature set. These selected features improve the learning model efficiency. This study proposes a forward sequential feature selection method called Forward Selection with Genetic Algorithm (FS-GA). FS-GA consists of three major steps. First, it creates the preliminarily selected subsets. Second, it provides an improvement on the previous subsets. Third, it optimizes the selected subset using the genetic algorithm. Hence, it maximizes the classification accuracy during the feature addition. We performed experiments based on ten standard UCI datasets using three popular classification models including the Decision Tree, Naive Bayes, and K-Nearest Neighbour classifiers. The results are compared with the state-of-the-art methods. FS-GA has shown the best results against the other sequential forward selection methods for all the tested datasets with O(n 2 ) time complexity.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于遗传算法的分类顺序正向选择优化
随着现代技术的数字化转型,数据量的显著增加导致了数据分析和数据挖掘中新的知识发现技术的出现。这些数据通常由影响分析结果的噪声或非信息特征组成。在过去的几十年里,人们对特征消除方法进行了广泛的研究。它是挖掘过程中重要的预处理步骤,从原始特征集中只选择信息特征。这些选择的特征提高了学习模型的效率。本研究提出一种前向序列特征选择方法,称为遗传算法前向选择(FS-GA)。FS-GA包括三个主要步骤。首先,它创建初步选择的子集。其次,它对前面的子集进行了改进。第三,利用遗传算法对所选子集进行优化。因此,在特征添加过程中使分类精度最大化。我们在10个标准UCI数据集上进行了实验,使用了三种流行的分类模型,包括决策树、朴素贝叶斯和k近邻分类器。结果与最先进的方法进行了比较。在所有时间复杂度为0 (n 2)的测试数据集上,FS-GA比其他顺序正向选择方法表现出最好的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Informatica
Informatica 工程技术-计算机:信息系统
CiteScore
5.90
自引率
6.90%
发文量
19
审稿时长
12 months
期刊介绍: The quarterly journal Informatica provides an international forum for high-quality original research and publishes papers on mathematical simulation and optimization, recognition and control, programming theory and systems, automation systems and elements. Informatica provides a multidisciplinary forum for scientists and engineers involved in research and design including experts who implement and manage information systems applications.
期刊最新文献
Beyond Quasi-Adjoint Graphs: On Polynomial-Time Solvable Cases of the Hamiltonian Cycle and Path Problems Confidential Transaction Balance Verification by the Net Using Non-Interactive Zero-Knowledge Proofs An Improved Algorithm for Extracting Frequent Gradual Patterns Offloaded Data Processing Energy Efficiency Evaluation Demystifying the Stability and the Performance Aspects of CoCoSo Ranking Method under Uncertain Preferences
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1