Tao Wang, LiYun Jia, JiaLing Xu, Ahmed G. Gad, Hai Ren, Ahmed Salem
{"title":"从大规模医疗数据中选择鉴别基因的混合智能优化算法","authors":"Tao Wang, LiYun Jia, JiaLing Xu, Ahmed G. Gad, Hai Ren, Ahmed Salem","doi":"10.1007/s13042-024-02292-3","DOIUrl":null,"url":null,"abstract":"<p>Identifying disease-related genes is an ongoing study issue in biomedical analysis. Many research has recently presented various strategies for predicting disease-related genes. However, only a handful of them were capable of identifying or selecting relevant genes with a low computational burden. In order to tackle this issue, we introduce a new filter–wrapper-based gene selection (GS) method based on metaheuristic algorithms (MHAs) in conjunction with the <i>k</i>-nearest neighbors (<span>\\({k{\\hbox {-NN}}}\\)</span>) classifier. Specifically, we hybridize two MHAs, bat algorithm (BA) and JAYA algorithm (JA), embedded with perturbation as a new perturbation-based exploration strategy (PES), to obtain JAYA–bat algorithm (JBA). The fact that JBA outperforms 10 state-of-the-art GS methods on 12 high-dimensional microarray datasets (ranging from 2000 to 22,283 features or genes) is impressive. It is also noteworthy that relevant genes are first selected via a filter-based method called mutual information (MI), and then further optimized by JBA to select the near-optimal genes in a timely fashion. Comparing the performance analysis of 11 well-known original MHAs, including BA and JA, the proposed JBA achieves significantly better results with improvement rates of 12.36%, 12.45%, 97.88%, 9.84%, 12.45%, and 12.17% in terms of fitness, accuracy, gene selection ratio, precision, recall, and F1-score, respectively. The results of Wilcoxon’s signed-rank test at a significance level of <span>\\(\\alpha =0.05\\)</span> further validate the superiority of JBA over its peers on most of the datasets. The use of PES and the combination of BA and JA’s strengths appear to enhance JBA’s exploration and exploitation capabilities. This gives it a significant advantage in gene selection ratio, while also ensuring the highest classification accuracy and the lowest computational time among all competing algorithms. Thus, this research could potentially make a significant contribution to the field of biomedical analysis.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"26 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A hybrid intelligent optimization algorithm to select discriminative genes from large-scale medical data\",\"authors\":\"Tao Wang, LiYun Jia, JiaLing Xu, Ahmed G. Gad, Hai Ren, Ahmed Salem\",\"doi\":\"10.1007/s13042-024-02292-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Identifying disease-related genes is an ongoing study issue in biomedical analysis. Many research has recently presented various strategies for predicting disease-related genes. However, only a handful of them were capable of identifying or selecting relevant genes with a low computational burden. In order to tackle this issue, we introduce a new filter–wrapper-based gene selection (GS) method based on metaheuristic algorithms (MHAs) in conjunction with the <i>k</i>-nearest neighbors (<span>\\\\({k{\\\\hbox {-NN}}}\\\\)</span>) classifier. Specifically, we hybridize two MHAs, bat algorithm (BA) and JAYA algorithm (JA), embedded with perturbation as a new perturbation-based exploration strategy (PES), to obtain JAYA–bat algorithm (JBA). The fact that JBA outperforms 10 state-of-the-art GS methods on 12 high-dimensional microarray datasets (ranging from 2000 to 22,283 features or genes) is impressive. It is also noteworthy that relevant genes are first selected via a filter-based method called mutual information (MI), and then further optimized by JBA to select the near-optimal genes in a timely fashion. Comparing the performance analysis of 11 well-known original MHAs, including BA and JA, the proposed JBA achieves significantly better results with improvement rates of 12.36%, 12.45%, 97.88%, 9.84%, 12.45%, and 12.17% in terms of fitness, accuracy, gene selection ratio, precision, recall, and F1-score, respectively. The results of Wilcoxon’s signed-rank test at a significance level of <span>\\\\(\\\\alpha =0.05\\\\)</span> further validate the superiority of JBA over its peers on most of the datasets. The use of PES and the combination of BA and JA’s strengths appear to enhance JBA’s exploration and exploitation capabilities. This gives it a significant advantage in gene selection ratio, while also ensuring the highest classification accuracy and the lowest computational time among all competing algorithms. Thus, this research could potentially make a significant contribution to the field of biomedical analysis.</p>\",\"PeriodicalId\":51327,\"journal\":{\"name\":\"International Journal of Machine Learning and Cybernetics\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Machine Learning and Cybernetics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s13042-024-02292-3\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Machine Learning and Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s13042-024-02292-3","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
识别疾病相关基因是生物医学分析中的一个持续研究课题。最近,许多研究提出了各种预测疾病相关基因的策略。然而,其中只有少数几种能以较低的计算负担识别或选择相关基因。为了解决这个问题,我们引入了一种基于元启发式算法(MHAs)和k-近邻({k{\hbox {-NN}}\)分类器的新的基因选择(GS)方法。具体来说,我们将蝙蝠算法(BA)和 JAYA 算法(JA)这两种 MHA 混合,并嵌入扰动作为一种新的基于扰动的探索策略(PES),从而得到 JAYA-bat 算法(JBA)。在 12 个高维微阵列数据集(从 2000 个特征或基因到 22283 个特征或基因)上,JBA 的表现优于 10 种最先进的 GS 方法,令人印象深刻。值得注意的是,相关基因首先是通过一种称为互信息(MI)的滤波方法筛选出来的,然后由 JBA 进一步优化,及时选出接近最优的基因。通过对包括 BA 和 JA 在内的 11 种著名原始 MHA 的性能分析比较,所提出的 JBA 取得了明显更好的结果,在适合度、准确度、基因选择比、精确度、召回率和 F1 分数方面的改进率分别为 12.36%、12.45%、97.88%、9.84%、12.45% 和 12.17%。在显著性水平(α =0.05)下的Wilcoxon符号秩检验结果进一步验证了JBA在大多数数据集上优于其同行。PES 的使用以及 BA 和 JA 优势的结合似乎增强了 JBA 的探索和利用能力。这使其在基因选择比例上具有显著优势,同时也确保了其在所有竞争算法中具有最高的分类准确性和最短的计算时间。因此,这项研究有可能为生物医学分析领域做出重大贡献。
A hybrid intelligent optimization algorithm to select discriminative genes from large-scale medical data
Identifying disease-related genes is an ongoing study issue in biomedical analysis. Many research has recently presented various strategies for predicting disease-related genes. However, only a handful of them were capable of identifying or selecting relevant genes with a low computational burden. In order to tackle this issue, we introduce a new filter–wrapper-based gene selection (GS) method based on metaheuristic algorithms (MHAs) in conjunction with the k-nearest neighbors (\({k{\hbox {-NN}}}\)) classifier. Specifically, we hybridize two MHAs, bat algorithm (BA) and JAYA algorithm (JA), embedded with perturbation as a new perturbation-based exploration strategy (PES), to obtain JAYA–bat algorithm (JBA). The fact that JBA outperforms 10 state-of-the-art GS methods on 12 high-dimensional microarray datasets (ranging from 2000 to 22,283 features or genes) is impressive. It is also noteworthy that relevant genes are first selected via a filter-based method called mutual information (MI), and then further optimized by JBA to select the near-optimal genes in a timely fashion. Comparing the performance analysis of 11 well-known original MHAs, including BA and JA, the proposed JBA achieves significantly better results with improvement rates of 12.36%, 12.45%, 97.88%, 9.84%, 12.45%, and 12.17% in terms of fitness, accuracy, gene selection ratio, precision, recall, and F1-score, respectively. The results of Wilcoxon’s signed-rank test at a significance level of \(\alpha =0.05\) further validate the superiority of JBA over its peers on most of the datasets. The use of PES and the combination of BA and JA’s strengths appear to enhance JBA’s exploration and exploitation capabilities. This gives it a significant advantage in gene selection ratio, while also ensuring the highest classification accuracy and the lowest computational time among all competing algorithms. Thus, this research could potentially make a significant contribution to the field of biomedical analysis.
期刊介绍:
Cybernetics is concerned with describing complex interactions and interrelationships between systems which are omnipresent in our daily life. Machine Learning discovers fundamental functional relationships between variables and ensembles of variables in systems. The merging of the disciplines of Machine Learning and Cybernetics is aimed at the discovery of various forms of interaction between systems through diverse mechanisms of learning from data.
The International Journal of Machine Learning and Cybernetics (IJMLC) focuses on the key research problems emerging at the junction of machine learning and cybernetics and serves as a broad forum for rapid dissemination of the latest advancements in the area. The emphasis of IJMLC is on the hybrid development of machine learning and cybernetics schemes inspired by different contributing disciplines such as engineering, mathematics, cognitive sciences, and applications. New ideas, design alternatives, implementations and case studies pertaining to all the aspects of machine learning and cybernetics fall within the scope of the IJMLC.
Key research areas to be covered by the journal include:
Machine Learning for modeling interactions between systems
Pattern Recognition technology to support discovery of system-environment interaction
Control of system-environment interactions
Biochemical interaction in biological and biologically-inspired systems
Learning for improvement of communication schemes between systems