基于遗传算法的分类顺序正向选择优化

IF 3.3 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Informatica Pub Date : 2023-11-08 DOI:10.31449/inf.v46i9.4964

Knitchepon Chotchantarakun

{"title":"基于遗传算法的分类顺序正向选择优化","authors":"Knitchepon Chotchantarakun","doi":"10.31449/inf.v46i9.4964","DOIUrl":null,"url":null,"abstract":"Regarding the digital transformation of modern technologies, the amount of data increases significantly resulting in novel knowledge discovery techniques in Data Analytic and Data Mining. These data usually consist of noises or non-informative features which affect the analysis results. The features-eliminating approaches have been studied extensively in the past few decades name feature selection. It is a significant preprocessing step of the mining process, which selects only the informative features from the original feature set. These selected features improve the learning model efficiency. This study proposes a forward sequential feature selection method called Forward Selection with Genetic Algorithm (FS-GA). FS-GA consists of three major steps. First, it creates the preliminarily selected subsets. Second, it provides an improvement on the previous subsets. Third, it optimizes the selected subset using the genetic algorithm. Hence, it maximizes the classification accuracy during the feature addition. We performed experiments based on ten standard UCI datasets using three popular classification models including the Decision Tree, Naive Bayes, and K-Nearest Neighbour classifiers. The results are compared with the state-of-the-art methods. FS-GA has shown the best results against the other sequential forward selection methods for all the tested datasets with O(n 2 ) time complexity.","PeriodicalId":56292,"journal":{"name":"Informatica","volume":"51 4","pages":"0"},"PeriodicalIF":3.3000,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimizing Sequential Forward Selection on Classification using Genetic Algorithm\",\"authors\":\"Knitchepon Chotchantarakun\",\"doi\":\"10.31449/inf.v46i9.4964\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Regarding the digital transformation of modern technologies, the amount of data increases significantly resulting in novel knowledge discovery techniques in Data Analytic and Data Mining. These data usually consist of noises or non-informative features which affect the analysis results. The features-eliminating approaches have been studied extensively in the past few decades name feature selection. It is a significant preprocessing step of the mining process, which selects only the informative features from the original feature set. These selected features improve the learning model efficiency. This study proposes a forward sequential feature selection method called Forward Selection with Genetic Algorithm (FS-GA). FS-GA consists of three major steps. First, it creates the preliminarily selected subsets. Second, it provides an improvement on the previous subsets. Third, it optimizes the selected subset using the genetic algorithm. Hence, it maximizes the classification accuracy during the feature addition. We performed experiments based on ten standard UCI datasets using three popular classification models including the Decision Tree, Naive Bayes, and K-Nearest Neighbour classifiers. The results are compared with the state-of-the-art methods. FS-GA has shown the best results against the other sequential forward selection methods for all the tested datasets with O(n 2 ) time complexity.\",\"PeriodicalId\":56292,\"journal\":{\"name\":\"Informatica\",\"volume\":\"51 4\",\"pages\":\"0\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2023-11-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Informatica\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.31449/inf.v46i9.4964\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31449/inf.v46i9.4964","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

随着现代技术的数字化转型，数据量的显著增加导致了数据分析和数据挖掘中新的知识发现技术的出现。这些数据通常由影响分析结果的噪声或非信息特征组成。在过去的几十年里，人们对特征消除方法进行了广泛的研究。它是挖掘过程中重要的预处理步骤，从原始特征集中只选择信息特征。这些选择的特征提高了学习模型的效率。本研究提出一种前向序列特征选择方法，称为遗传算法前向选择(FS-GA)。FS-GA包括三个主要步骤。首先，它创建初步选择的子集。其次，它对前面的子集进行了改进。第三，利用遗传算法对所选子集进行优化。因此，在特征添加过程中使分类精度最大化。我们在10个标准UCI数据集上进行了实验，使用了三种流行的分类模型，包括决策树、朴素贝叶斯和k近邻分类器。结果与最先进的方法进行了比较。在所有时间复杂度为0 (n 2)的测试数据集上，FS-GA比其他顺序正向选择方法表现出最好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Optimizing Sequential Forward Selection on Classification using Genetic Algorithm

Regarding the digital transformation of modern technologies, the amount of data increases significantly resulting in novel knowledge discovery techniques in Data Analytic and Data Mining. These data usually consist of noises or non-informative features which affect the analysis results. The features-eliminating approaches have been studied extensively in the past few decades name feature selection. It is a significant preprocessing step of the mining process, which selects only the informative features from the original feature set. These selected features improve the learning model efficiency. This study proposes a forward sequential feature selection method called Forward Selection with Genetic Algorithm (FS-GA). FS-GA consists of three major steps. First, it creates the preliminarily selected subsets. Second, it provides an improvement on the previous subsets. Third, it optimizes the selected subset using the genetic algorithm. Hence, it maximizes the classification accuracy during the feature addition. We performed experiments based on ten standard UCI datasets using three popular classification models including the Decision Tree, Naive Bayes, and K-Nearest Neighbour classifiers. The results are compared with the state-of-the-art methods. FS-GA has shown the best results against the other sequential forward selection methods for all the tested datasets with O(n 2 ) time complexity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Informatica 工程技术-计算机：信息系统

CiteScore

5.90

自引率

6.90%

发文量

审稿时长

12 months

期刊介绍： The quarterly journal Informatica provides an international forum for high-quality original research and publishes papers on mathematical simulation and optimization, recognition and control, programming theory and systems, automation systems and elements. Informatica provides a multidisciplinary forum for scientists and engineers involved in research and design including experts who implement and manage information systems applications.