{"title":"Online feature subset selection for mining feature streams in big data via incremental learning and evolutionary computation","authors":"Yelleti Vivek , Vadlamani Ravi , P. Radha Krishna","doi":"10.1016/j.swevo.2025.101896","DOIUrl":null,"url":null,"abstract":"<div><div>Online streaming feature subset selection (OSFSS) presents a noteworthy challenge when data samples arrive rapidly and in a time-dependent manner. The complexity of this problem is further exacerbated when features arrive as a stream. Despite several attempts to solve OSFSS over feature streams, existing methods lack scalability, cannot handle interaction effects among features, and fail to efficiently handle high-velocity feature streams. To address these challenges, we propose a novel wrapper-for OSFSS named OSFSS-W (wrapper-for OSFSS), specifically designed to mine feature streams within the Apache Spark environment. Our proposed method incorporates (i) two vigilance tests: for removing (a) irrelevant features and (b) redundant features (ii) incremental learning and (iii) a tolerance-based feedback mechanism that retains and utilizes previous knowledge while adhering to the predefined tolerance thresholds. Additionally, for the purpose of optimization, we introduce a Bare Bones Particle Swarm Optimization (BBPSO-L) algorithm driven by the logistic distribution. Further, the BBPSO-L is parallelized within Apache Spark, following an island-based approach. We evaluated the performance of the proposed algorithm on the datasets taken from the cybersecurity, bioinformatics, and finance domains. The results demonstrate that incorporating two vigilance tests coupled with a tolerance-based feedback mechanism significantly improved the median Area under the receiver operating characteristic curve (AUC) and median cardinality across all datasets.</div></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"94 ","pages":"Article 101896"},"PeriodicalIF":8.2000,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210650225000549","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Online streaming feature subset selection (OSFSS) presents a noteworthy challenge when data samples arrive rapidly and in a time-dependent manner. The complexity of this problem is further exacerbated when features arrive as a stream. Despite several attempts to solve OSFSS over feature streams, existing methods lack scalability, cannot handle interaction effects among features, and fail to efficiently handle high-velocity feature streams. To address these challenges, we propose a novel wrapper-for OSFSS named OSFSS-W (wrapper-for OSFSS), specifically designed to mine feature streams within the Apache Spark environment. Our proposed method incorporates (i) two vigilance tests: for removing (a) irrelevant features and (b) redundant features (ii) incremental learning and (iii) a tolerance-based feedback mechanism that retains and utilizes previous knowledge while adhering to the predefined tolerance thresholds. Additionally, for the purpose of optimization, we introduce a Bare Bones Particle Swarm Optimization (BBPSO-L) algorithm driven by the logistic distribution. Further, the BBPSO-L is parallelized within Apache Spark, following an island-based approach. We evaluated the performance of the proposed algorithm on the datasets taken from the cybersecurity, bioinformatics, and finance domains. The results demonstrate that incorporating two vigilance tests coupled with a tolerance-based feedback mechanism significantly improved the median Area under the receiver operating characteristic curve (AUC) and median cardinality across all datasets.
期刊介绍:
Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.