Wangduk Seo, Jaegyun Park, Sanghyuck Lee, A-Seong Moon, Dae-Won Kim, Jaesung Lee
{"title":"Memetic multilabel feature selection using pruned refinement process","authors":"Wangduk Seo, Jaegyun Park, Sanghyuck Lee, A-Seong Moon, Dae-Won Kim, Jaesung Lee","doi":"10.1186/s40537-024-00961-2","DOIUrl":null,"url":null,"abstract":"<p>With the growing complexity of data structures, which include high-dimensional and multilabel datasets, the significance of feature selection has become more emphasized. Multilabel feature selection endeavors to identify a subset of features that concurrently exhibit relevance across multiple labels. Owing to the impracticality of performing exhaustive searches to obtain the optimal feature subset, conventional approaches in multilabel feature selection often resort to a heuristic search process. In this context, memetic multilabel feature selection has received considerable attention because of its superior search capability; the fitness of the feature subset created by the stochastic search is further enhanced through a refinement process predicated on the employed multilabel feature filter. Thus, it is imperative to employ an effective refinement process that frequently succeeds in improving the target feature subset to maximize the benefits of hybridization. However, the refinement process in conventional memetic multilabel feature selection often overlooks potential biases in feature scores and compatibility issues between the multilabel feature filter and the subsequent learner. Consequently, conventional methods may not effectively identify the optimal feature subset in complex multilabel datasets. In this study, we propose a new memetic multilabel feature selection method that addresses these limitations by incorporating the pruning of features and labels into the refinement process. The effectiveness of the proposed method was demonstrated through experiments on 14 multilabel datasets.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"25 1","pages":""},"PeriodicalIF":8.6000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Big Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s40537-024-00961-2","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
With the growing complexity of data structures, which include high-dimensional and multilabel datasets, the significance of feature selection has become more emphasized. Multilabel feature selection endeavors to identify a subset of features that concurrently exhibit relevance across multiple labels. Owing to the impracticality of performing exhaustive searches to obtain the optimal feature subset, conventional approaches in multilabel feature selection often resort to a heuristic search process. In this context, memetic multilabel feature selection has received considerable attention because of its superior search capability; the fitness of the feature subset created by the stochastic search is further enhanced through a refinement process predicated on the employed multilabel feature filter. Thus, it is imperative to employ an effective refinement process that frequently succeeds in improving the target feature subset to maximize the benefits of hybridization. However, the refinement process in conventional memetic multilabel feature selection often overlooks potential biases in feature scores and compatibility issues between the multilabel feature filter and the subsequent learner. Consequently, conventional methods may not effectively identify the optimal feature subset in complex multilabel datasets. In this study, we propose a new memetic multilabel feature selection method that addresses these limitations by incorporating the pruning of features and labels into the refinement process. The effectiveness of the proposed method was demonstrated through experiments on 14 multilabel datasets.
期刊介绍:
The Journal of Big Data publishes high-quality, scholarly research papers, methodologies, and case studies covering a broad spectrum of topics, from big data analytics to data-intensive computing and all applications of big data research. It addresses challenges facing big data today and in the future, including data capture and storage, search, sharing, analytics, technologies, visualization, architectures, data mining, machine learning, cloud computing, distributed systems, and scalable storage. The journal serves as a seminal source of innovative material for academic researchers and practitioners alike.