Evolutionary Sparsity Regularisation-based Feature Selection for Binary Classification.

IF 4.6 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Evolutionary Computation Pub Date : 2024-08-22 DOI:10.1162/evco_a_00358

Bach Hoai Nguyen, Bing Xue, Mengjie Zhang

{"title":"Evolutionary Sparsity Regularisation-based Feature Selection for Binary Classification.","authors":"Bach Hoai Nguyen, Bing Xue, Mengjie Zhang","doi":"10.1162/evco_a_00358","DOIUrl":null,"url":null,"abstract":"<p><p>In classification, feature selection is an essential pre-processing step that selects a small subset of features to improve classification performance. Existing feature selection approaches can be divided into three main approaches: wrapper approaches, filter approaches, and embedded approaches. In comparison with two other approaches, embedded approaches usually have better trade-off between classification performance and computation time. One of the most well-known embedded approaches is sparsity regularisation-based feature selection which generates sparse solutions for feature selection. Despite its good performance, sparsity regularisation-based feature selection outputs only a feature ranking which requires the number of selected features to be predefined. More importantly, the ranking mechanism introduces a risk of ignoring feature interactions which leads to the fact that many top-ranked but redundant features are selected. This work addresses the above problems by proposing a new representation that considers the interactions between features and can automatically determine an appropriate number of selected features. The proposed representation is used in a differential evolutionary (DE) algorithm to optimise the feature subset. In addition, a novel initialisation mechanism is proposed to let DE consider various numbers of selected features at the beginning. The proposed algorithm is examined on both synthetic and real-world datasets. The results on the synthetic dataset show that the proposed algorithm can select complementary features while existing sparsity regularisation-based feature selection algorithms are at risk of selecting redundant features. The results on real-world datasets show that the proposed algorithm achieves better classification performance than well-known wrapper, filter, and embedded approaches. The algorithm is also as efficient as filter feature selection approaches.</p>","PeriodicalId":50470,"journal":{"name":"Evolutionary Computation","volume":" ","pages":"1-33"},"PeriodicalIF":4.6000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/evco_a_00358","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In classification, feature selection is an essential pre-processing step that selects a small subset of features to improve classification performance. Existing feature selection approaches can be divided into three main approaches: wrapper approaches, filter approaches, and embedded approaches. In comparison with two other approaches, embedded approaches usually have better trade-off between classification performance and computation time. One of the most well-known embedded approaches is sparsity regularisation-based feature selection which generates sparse solutions for feature selection. Despite its good performance, sparsity regularisation-based feature selection outputs only a feature ranking which requires the number of selected features to be predefined. More importantly, the ranking mechanism introduces a risk of ignoring feature interactions which leads to the fact that many top-ranked but redundant features are selected. This work addresses the above problems by proposing a new representation that considers the interactions between features and can automatically determine an appropriate number of selected features. The proposed representation is used in a differential evolutionary (DE) algorithm to optimise the feature subset. In addition, a novel initialisation mechanism is proposed to let DE consider various numbers of selected features at the beginning. The proposed algorithm is examined on both synthetic and real-world datasets. The results on the synthetic dataset show that the proposed algorithm can select complementary features while existing sparsity regularisation-based feature selection algorithms are at risk of selecting redundant features. The results on real-world datasets show that the proposed algorithm achieves better classification performance than well-known wrapper, filter, and embedded approaches. The algorithm is also as efficient as filter feature selection approaches.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于进化稀疏正则化的二元分类特征选择

在分类过程中，特征选择是一个重要的预处理步骤，它可以选择一小部分特征子集来提高分类性能。现有的特征选择方法主要分为三种：包装方法、过滤方法和嵌入方法。与其他两种方法相比，嵌入式方法通常能更好地权衡分类性能和计算时间。最著名的嵌入式方法之一是基于稀疏正则化的特征选择，它能为特征选择生成稀疏解。尽管基于稀疏正则化的特征选择性能良好，但它只能输出一个特征排序，而排序需要预定义所选特征的数量。更重要的是，这种排序机制有可能忽略特征之间的相互作用，从而导致许多排名靠前但多余的特征被选中。为了解决上述问题，本研究提出了一种新的表示方法，它考虑了特征之间的相互作用，并能自动确定所选特征的适当数量。提出的表示法被用于差分进化（DE）算法，以优化特征子集。此外，还提出了一种新颖的初始化机制，让差分进化算法在开始时就能考虑各种数量的选定特征。我们在合成数据集和实际数据集上对所提出的算法进行了检验。合成数据集上的结果表明，提出的算法可以选择互补特征，而现有的基于稀疏正则化的特征选择算法则有可能选择冗余特征。在真实数据集上的结果表明，与众所周知的包装方法、过滤方法和嵌入方法相比，所提出的算法取得了更好的分类性能。该算法的效率也不亚于滤波器特征选择方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Evolutionary Computation 工程技术-计算机：理论方法

CiteScore

6.40

自引率

1.50%

发文量

审稿时长

3 months

期刊介绍： Evolutionary Computation is a leading journal in its field. It provides an international forum for facilitating and enhancing the exchange of information among researchers involved in both the theoretical and practical aspects of computational systems drawing their inspiration from nature, with particular emphasis on evolutionary models of computation such as genetic algorithms, evolutionary strategies, classifier systems, evolutionary programming, and genetic programming. It welcomes articles from related fields such as swarm intelligence (e.g. Ant Colony Optimization and Particle Swarm Optimization), and other nature-inspired computation paradigms (e.g. Artificial Immune Systems). As well as publishing articles describing theoretical and/or experimental work, the journal also welcomes application-focused papers describing breakthrough results in an application domain or methodological papers where the specificities of the real-world problem led to significant algorithmic improvements that could possibly be generalized to other areas.