基于 SLI 和遗传算法的微阵列数据集混合特征选择。

IF 2.5 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Journal of Supercomputing Pub Date : 2022-01-01 Epub Date: 2022-06-30 DOI:10.1007/s11227-022-04650-w
Sedighe Abasabadi, Hossein Nematzadeh, Homayun Motameni, Ebrahim Akbari
{"title":"基于 SLI 和遗传算法的微阵列数据集混合特征选择。","authors":"Sedighe Abasabadi, Hossein Nematzadeh, Homayun Motameni, Ebrahim Akbari","doi":"10.1007/s11227-022-04650-w","DOIUrl":null,"url":null,"abstract":"<p><p>One of the major problems in microarray datasets is the large number of features, which causes the issue of \"the curse of dimensionality\" when machine learning is applied to these datasets. Feature selection refers to the process of finding optimal feature set by removing irrelevant and redundant features. It has a significant role in pattern recognition, classification, and machine learning. In this study, a new and efficient hybrid feature selection method, called Ga<sub>rank&rand</sub>, is presented. The method combines a wrapper feature selection algorithm based on the genetic algorithm (GA) with a proposed filter feature selection method, SLI-<i>γ</i>. In Ga<sub>rank&rand</sub>, some initial solutions are built regarding the most relevant features based on SLI-<i>γ</i>, and the remaining ones are only the random features. Eleven high-dimensional and standard datasets were used for the accuracy evaluation of the proposed SLI-<i>γ</i>. Additionally, four high-dimensional well-known datasets of microarray experiments were used to carry out an extensive experimental study for the performance evaluation of Ga<sub>rank&rand</sub>. This experimental analysis showed the robustness of the method as well as its ability to obtain highly accurate solutions at the earlier stages of the GA evolutionary process. Finally, the performance of Ga<sub>rank&rand</sub> was also compared to the results of GA to highlight its competitiveness and its ability to successfully reduce the original feature set size and execution time.</p>","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":"78 18","pages":"19725-19753"},"PeriodicalIF":2.5000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9244444/pdf/","citationCount":"0","resultStr":"{\"title\":\"Hybrid feature selection based on SLI and genetic algorithm for microarray datasets.\",\"authors\":\"Sedighe Abasabadi, Hossein Nematzadeh, Homayun Motameni, Ebrahim Akbari\",\"doi\":\"10.1007/s11227-022-04650-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>One of the major problems in microarray datasets is the large number of features, which causes the issue of \\\"the curse of dimensionality\\\" when machine learning is applied to these datasets. Feature selection refers to the process of finding optimal feature set by removing irrelevant and redundant features. It has a significant role in pattern recognition, classification, and machine learning. In this study, a new and efficient hybrid feature selection method, called Ga<sub>rank&rand</sub>, is presented. The method combines a wrapper feature selection algorithm based on the genetic algorithm (GA) with a proposed filter feature selection method, SLI-<i>γ</i>. In Ga<sub>rank&rand</sub>, some initial solutions are built regarding the most relevant features based on SLI-<i>γ</i>, and the remaining ones are only the random features. Eleven high-dimensional and standard datasets were used for the accuracy evaluation of the proposed SLI-<i>γ</i>. Additionally, four high-dimensional well-known datasets of microarray experiments were used to carry out an extensive experimental study for the performance evaluation of Ga<sub>rank&rand</sub>. This experimental analysis showed the robustness of the method as well as its ability to obtain highly accurate solutions at the earlier stages of the GA evolutionary process. Finally, the performance of Ga<sub>rank&rand</sub> was also compared to the results of GA to highlight its competitiveness and its ability to successfully reduce the original feature set size and execution time.</p>\",\"PeriodicalId\":50034,\"journal\":{\"name\":\"Journal of Supercomputing\",\"volume\":\"78 18\",\"pages\":\"19725-19753\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9244444/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Supercomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11227-022-04650-w\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2022/6/30 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Supercomputing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11227-022-04650-w","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/6/30 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

微阵列数据集的主要问题之一是特征数量庞大,这导致机器学习应用于这些数据集时出现 "维度诅咒 "问题。特征选择是指通过去除无关特征和冗余特征来找到最佳特征集的过程。它在模式识别、分类和机器学习中发挥着重要作用。本研究提出了一种名为 Garank&rand 的新型高效混合特征选择方法。该方法结合了基于遗传算法(GA)的包装特征选择算法和建议的过滤特征选择方法 SLI-γ。在 Garank&rand 中,根据 SLI-γ 建立了一些与最相关特征有关的初始解,其余的只是随机特征。11 个高维标准数据集用于评估 SLI-γ 的准确性。此外,还使用了四个著名的高维微阵列实验数据集,对 Garank&rand 的性能评估进行了广泛的实验研究。实验分析表明了该方法的鲁棒性以及在 GA 进化过程的早期阶段获得高精度解的能力。最后,Garank&rand 的性能还与 GA 的结果进行了比较,以突出其竞争力及其成功减少原始特征集大小和执行时间的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Hybrid feature selection based on SLI and genetic algorithm for microarray datasets.

One of the major problems in microarray datasets is the large number of features, which causes the issue of "the curse of dimensionality" when machine learning is applied to these datasets. Feature selection refers to the process of finding optimal feature set by removing irrelevant and redundant features. It has a significant role in pattern recognition, classification, and machine learning. In this study, a new and efficient hybrid feature selection method, called Garank&rand, is presented. The method combines a wrapper feature selection algorithm based on the genetic algorithm (GA) with a proposed filter feature selection method, SLI-γ. In Garank&rand, some initial solutions are built regarding the most relevant features based on SLI-γ, and the remaining ones are only the random features. Eleven high-dimensional and standard datasets were used for the accuracy evaluation of the proposed SLI-γ. Additionally, four high-dimensional well-known datasets of microarray experiments were used to carry out an extensive experimental study for the performance evaluation of Garank&rand. This experimental analysis showed the robustness of the method as well as its ability to obtain highly accurate solutions at the earlier stages of the GA evolutionary process. Finally, the performance of Garank&rand was also compared to the results of GA to highlight its competitiveness and its ability to successfully reduce the original feature set size and execution time.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Supercomputing
Journal of Supercomputing 工程技术-工程:电子与电气
CiteScore
6.30
自引率
12.10%
发文量
734
审稿时长
13 months
期刊介绍: The Journal of Supercomputing publishes papers on the technology, architecture and systems, algorithms, languages and programs, performance measures and methods, and applications of all aspects of Supercomputing. Tutorial and survey papers are intended for workers and students in the fields associated with and employing advanced computer systems. The journal also publishes letters to the editor, especially in areas relating to policy, succinct statements of paradoxes, intuitively puzzling results, partial results and real needs. Published theoretical and practical papers are advanced, in-depth treatments describing new developments and new ideas. Each includes an introduction summarizing prior, directly pertinent work that is useful for the reader to understand, in order to appreciate the advances being described.
期刊最新文献
Topic sentiment analysis based on deep neural network using document embedding technique. A Fechner multiscale local descriptor for face recognition. Data quality model for assessing public COVID-19 big datasets. BTDA: Two-factor dynamic identity authentication scheme for data trading based on alliance chain. Driving behavior analysis and classification by vehicle OBD data using machine learning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1