{"title":"Improved aquila optimizer with mRMR for feature selection of high-dimensional gene expression data","authors":"Xiwen Qin, Siqi Zhang, Xiaogang Dong, Hongyu Shi, Liping Yuan","doi":"10.1007/s10586-024-04614-0","DOIUrl":null,"url":null,"abstract":"<p>Accurate classification of gene expression data is crucial for disease diagnosis and drug discovery. However, gene expression data usually has a large number of features, which poses a challenge for accurate classification. In this paper, a novel feature selection method based on minimal redundancy maximal relevance (mRMR) and aquila optimizer is proposed, which introduces the mRMR method in the initialization stage of the population to generate excellent initial populations, effectively improve the quality of the population, and then, the using random opposition-based learning strategy to improve the diversity of aquila population and accelerate the convergence speed of the algorithm, and finally, introducing inertia weight in the position update formula in the late iteration of the aquila optimizer to avoid the algorithm falling into the local optimum and improve the algorithm’s capability to find the optimum. In order to verify the effectiveness of the proposed method, ten real gene expression datasets are selected in this paper and compared with several meta-heuristic algorithms. Experimental results show that the proposed method is significantly superior to other meta-heuristic algorithms in terms of fitness value, classification accuracy and the number of selected features. Compared with the original aquila optimizer, the average classification accuracy of the proposed method on KNN and SVM classifiers is improved by 3.48–12.41% and 0.53–18.63% respectively. The proposed method significantly reduces the feature dimension of gene expression data, retains important features, and obtains higher classification accuracy, providing a new method and idea for feature selection of gene expression data.</p>","PeriodicalId":501576,"journal":{"name":"Cluster Computing","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10586-024-04614-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate classification of gene expression data is crucial for disease diagnosis and drug discovery. However, gene expression data usually has a large number of features, which poses a challenge for accurate classification. In this paper, a novel feature selection method based on minimal redundancy maximal relevance (mRMR) and aquila optimizer is proposed, which introduces the mRMR method in the initialization stage of the population to generate excellent initial populations, effectively improve the quality of the population, and then, the using random opposition-based learning strategy to improve the diversity of aquila population and accelerate the convergence speed of the algorithm, and finally, introducing inertia weight in the position update formula in the late iteration of the aquila optimizer to avoid the algorithm falling into the local optimum and improve the algorithm’s capability to find the optimum. In order to verify the effectiveness of the proposed method, ten real gene expression datasets are selected in this paper and compared with several meta-heuristic algorithms. Experimental results show that the proposed method is significantly superior to other meta-heuristic algorithms in terms of fitness value, classification accuracy and the number of selected features. Compared with the original aquila optimizer, the average classification accuracy of the proposed method on KNN and SVM classifiers is improved by 3.48–12.41% and 0.53–18.63% respectively. The proposed method significantly reduces the feature dimension of gene expression data, retains important features, and obtains higher classification accuracy, providing a new method and idea for feature selection of gene expression data.