A robust ensemble feature selection approach to prioritize genes associated with survival outcome in high-dimensional gene expression data

Phi Le, Xingyue Gong, Leah Ung, Hai Yang, Bridget P Keenan, Li Zhang, Tao He
{"title":"A robust ensemble feature selection approach to prioritize genes associated with survival outcome in high-dimensional gene expression data","authors":"Phi Le, Xingyue Gong, Leah Ung, Hai Yang, Bridget P Keenan, Li Zhang, Tao He","doi":"10.3389/fsysb.2024.1355595","DOIUrl":null,"url":null,"abstract":"Exploring features associated with the clinical outcome of interest is a rapidly advancing area of research. However, with contemporary sequencing technologies capable of identifying over thousands of genes per sample, there is a challenge in constructing efficient prediction models that balance accuracy and resource utilization. To address this challenge, researchers have developed feature selection methods to enhance performance, reduce overfitting, and ensure resource efficiency. However, applying feature selection models to survival analysis, particularly in clinical datasets characterized by substantial censoring and limited sample sizes, introduces unique challenges. We propose a robust ensemble feature selection approach integrated with group Lasso to identify compelling features and evaluate its performance in predicting survival outcomes. Our approach consistently outperforms established models across various criteria through extensive simulations, demonstrating low false discovery rates, high sensitivity, and high stability. Furthermore, we applied the approach to a colorectal cancer dataset from The Cancer Genome Atlas, showcasing its effectiveness by generating a composite score based on the selected genes to correctly distinguish different subtypes of the patients. In summary, our proposed approach excels in selecting impactful features from high-dimensional data, yielding better outcomes compared to contemporary state-of-the-art models.","PeriodicalId":73109,"journal":{"name":"Frontiers in systems biology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in systems biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fsysb.2024.1355595","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Exploring features associated with the clinical outcome of interest is a rapidly advancing area of research. However, with contemporary sequencing technologies capable of identifying over thousands of genes per sample, there is a challenge in constructing efficient prediction models that balance accuracy and resource utilization. To address this challenge, researchers have developed feature selection methods to enhance performance, reduce overfitting, and ensure resource efficiency. However, applying feature selection models to survival analysis, particularly in clinical datasets characterized by substantial censoring and limited sample sizes, introduces unique challenges. We propose a robust ensemble feature selection approach integrated with group Lasso to identify compelling features and evaluate its performance in predicting survival outcomes. Our approach consistently outperforms established models across various criteria through extensive simulations, demonstrating low false discovery rates, high sensitivity, and high stability. Furthermore, we applied the approach to a colorectal cancer dataset from The Cancer Genome Atlas, showcasing its effectiveness by generating a composite score based on the selected genes to correctly distinguish different subtypes of the patients. In summary, our proposed approach excels in selecting impactful features from high-dimensional data, yielding better outcomes compared to contemporary state-of-the-art models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在高维基因表达数据中优先选择与生存结果相关基因的稳健集合特征选择方法
探索与相关临床结果相关的特征是一个快速发展的研究领域。然而,由于当代的测序技术能够识别每个样本中超过数千个基因,因此在构建兼顾准确性和资源利用率的高效预测模型方面存在挑战。为了应对这一挑战,研究人员开发了特征选择方法来提高性能、减少过拟合并确保资源效率。然而,将特征选择模型应用于生存分析,尤其是应用于具有大量删减和有限样本量特点的临床数据集,会带来独特的挑战。我们提出了一种与组 Lasso 相结合的稳健集合特征选择方法,用于识别有说服力的特征,并评估其在预测生存结果方面的性能。通过大量模拟,我们的方法在各种标准上始终优于既有模型,显示出低错误发现率、高灵敏度和高稳定性。此外,我们还将该方法应用于《癌症基因组图谱》中的结直肠癌数据集,通过根据所选基因生成综合评分来正确区分患者的不同亚型,从而展示了该方法的有效性。总之,与当代最先进的模型相比,我们提出的方法在从高维数据中选择有影响的特征方面表现出色,能产生更好的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Transporter annotations are holding up progress in metabolic modeling Life’s building blocks: the modular path to multiscale complexity Coupling quantitative systems pharmacology modelling to machine learning and artificial intelligence for drug development: its pAIns and gAIns Predicting chronic responses to calcium channel blockade with a virtual population of African Americans with hypertensive chronic kidney disease Building an Adverse Outcome Pathway network for COVID-19
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1