An efficient interpretable stacking ensemble model for lung cancer prognosis

IF 2.6 4区 生物学 Q2 BIOLOGY Computational Biology and Chemistry Pub Date : 2024-10-15 DOI:10.1016/j.compbiolchem.2024.108248
Umair Arif , Chunxia Zhang , Sajid Hussain , Abdul Rauf Abbasi
{"title":"An efficient interpretable stacking ensemble model for lung cancer prognosis","authors":"Umair Arif ,&nbsp;Chunxia Zhang ,&nbsp;Sajid Hussain ,&nbsp;Abdul Rauf Abbasi","doi":"10.1016/j.compbiolchem.2024.108248","DOIUrl":null,"url":null,"abstract":"<div><div>Lung cancer significantly contributes to global cancer mortality, posing challenges in clinical management. Early detection and accurate prognosis are crucial for improving patient outcomes. This study develops an interpretable stacking ensemble model (SEM) for lung cancer prognosis prediction and identifies key risk factors. Using a Kaggle dataset of 1000 patients with 22 variables, the model classifies prognosis into Low, Medium, and High-risk categories. The bootstrap method was employed for evaluation metrics, while SHAP (Shapley Additive Explanations) and LIME (Local Interpretable Model-agnostic Explanations) assessed model interpretability. Results showed SEM's superior interpretability over traditional models, such as Random Forest, Logistic Regression, Decision Tree, Gradient Boosting Machine, Extreme Gradient Boosting Machine, and Light Gradient Boosting Machine. SEM achieved an accuracy of 98.90 %, precision of 98.70 %, F1 score of 98.85 %, sensitivity of 98.77 %, specificity of 95.45 %, Cohen’s kappa value of 94.56 %, and an AUC of 98.10 %. The SEM demonstrated robust performance in lung cancer prognosis, revealing chronic lung cancer and genetic risk as major factors.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"113 ","pages":"Article 108248"},"PeriodicalIF":2.6000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Biology and Chemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1476927124002366","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Lung cancer significantly contributes to global cancer mortality, posing challenges in clinical management. Early detection and accurate prognosis are crucial for improving patient outcomes. This study develops an interpretable stacking ensemble model (SEM) for lung cancer prognosis prediction and identifies key risk factors. Using a Kaggle dataset of 1000 patients with 22 variables, the model classifies prognosis into Low, Medium, and High-risk categories. The bootstrap method was employed for evaluation metrics, while SHAP (Shapley Additive Explanations) and LIME (Local Interpretable Model-agnostic Explanations) assessed model interpretability. Results showed SEM's superior interpretability over traditional models, such as Random Forest, Logistic Regression, Decision Tree, Gradient Boosting Machine, Extreme Gradient Boosting Machine, and Light Gradient Boosting Machine. SEM achieved an accuracy of 98.90 %, precision of 98.70 %, F1 score of 98.85 %, sensitivity of 98.77 %, specificity of 95.45 %, Cohen’s kappa value of 94.56 %, and an AUC of 98.10 %. The SEM demonstrated robust performance in lung cancer prognosis, revealing chronic lung cancer and genetic risk as major factors.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
肺癌预后的高效可解释堆积集合模型
肺癌在全球癌症死亡率中占很大比例,给临床治疗带来了挑战。早期发现和准确预后对改善患者预后至关重要。本研究为肺癌预后预测开发了一种可解释的堆叠集合模型(SEM),并确定了关键风险因素。该模型使用包含 22 个变量的 1000 名患者的 Kaggle 数据集,将预后分为低、中和高风险类别。评估指标采用了自举法,而 SHAP(夏普利相加解释)和 LIME(本地可解释模型-不可知论解释)评估了模型的可解释性。结果显示,SEM 的可解释性优于随机森林、逻辑回归、决策树、梯度提升机、极梯度提升机和轻梯度提升机等传统模型。SEM 的准确度为 98.90 %,精确度为 98.70 %,F1 分数为 98.85 %,灵敏度为 98.77 %,特异度为 95.45 %,Cohen's kappa 值为 94.56 %,AUC 为 98.10 %。SEM 在肺癌预后方面表现出色,显示慢性肺癌和遗传风险是主要因素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computational Biology and Chemistry
Computational Biology and Chemistry 生物-计算机:跨学科应用
CiteScore
6.10
自引率
3.20%
发文量
142
审稿时长
24 days
期刊介绍: Computational Biology and Chemistry publishes original research papers and review articles in all areas of computational life sciences. High quality research contributions with a major computational component in the areas of nucleic acid and protein sequence research, molecular evolution, molecular genetics (functional genomics and proteomics), theory and practice of either biology-specific or chemical-biology-specific modeling, and structural biology of nucleic acids and proteins are particularly welcome. Exceptionally high quality research work in bioinformatics, systems biology, ecology, computational pharmacology, metabolism, biomedical engineering, epidemiology, and statistical genetics will also be considered. Given their inherent uncertainty, protein modeling and molecular docking studies should be thoroughly validated. In the absence of experimental results for validation, the use of molecular dynamics simulations along with detailed free energy calculations, for example, should be used as complementary techniques to support the major conclusions. Submissions of premature modeling exercises without additional biological insights will not be considered. Review articles will generally be commissioned by the editors and should not be submitted to the journal without explicit invitation. However prospective authors are welcome to send a brief (one to three pages) synopsis, which will be evaluated by the editors.
期刊最新文献
scSFCL:Deep clustering of scRNA-seq data with subspace feature confidence learning A structure-based approach to discover a potential isomerase Pin1 inhibitor for cancer therapy using computational simulation and biological studies A multi-perspective deep learning framework for enhancer characterization and identification Vector embeddings by sequence similarity and context for improved compression, similarity search, clustering, organization, and manipulation of cDNA libraries MuSE: A deep learning model based on multi-feature fusion for super-enhancer prediction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1