An efficient interpretable stacking ensemble model for lung cancer prognosis

IF 3.1 4区生物学 Q2 BIOLOGY Computational Biology and Chemistry Pub Date : 2024-12-01 Epub Date: 2024-10-15 DOI:10.1016/j.compbiolchem.2024.108248

Umair Arif , Chunxia Zhang , Sajid Hussain , Abdul Rauf Abbasi

{"title":"An efficient interpretable stacking ensemble model for lung cancer prognosis","authors":"Umair Arif , Chunxia Zhang , Sajid Hussain , Abdul Rauf Abbasi","doi":"10.1016/j.compbiolchem.2024.108248","DOIUrl":null,"url":null,"abstract":"<div><div>Lung cancer significantly contributes to global cancer mortality, posing challenges in clinical management. Early detection and accurate prognosis are crucial for improving patient outcomes. This study develops an interpretable stacking ensemble model (SEM) for lung cancer prognosis prediction and identifies key risk factors. Using a Kaggle dataset of 1000 patients with 22 variables, the model classifies prognosis into Low, Medium, and High-risk categories. The bootstrap method was employed for evaluation metrics, while SHAP (Shapley Additive Explanations) and LIME (Local Interpretable Model-agnostic Explanations) assessed model interpretability. Results showed SEM's superior interpretability over traditional models, such as Random Forest, Logistic Regression, Decision Tree, Gradient Boosting Machine, Extreme Gradient Boosting Machine, and Light Gradient Boosting Machine. SEM achieved an accuracy of 98.90 %, precision of 98.70 %, F1 score of 98.85 %, sensitivity of 98.77 %, specificity of 95.45 %, Cohen’s kappa value of 94.56 %, and an AUC of 98.10 %. The SEM demonstrated robust performance in lung cancer prognosis, revealing chronic lung cancer and genetic risk as major factors.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"113 ","pages":"Article 108248"},"PeriodicalIF":3.1000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Biology and Chemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1476927124002366","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/15 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Lung cancer significantly contributes to global cancer mortality, posing challenges in clinical management. Early detection and accurate prognosis are crucial for improving patient outcomes. This study develops an interpretable stacking ensemble model (SEM) for lung cancer prognosis prediction and identifies key risk factors. Using a Kaggle dataset of 1000 patients with 22 variables, the model classifies prognosis into Low, Medium, and High-risk categories. The bootstrap method was employed for evaluation metrics, while SHAP (Shapley Additive Explanations) and LIME (Local Interpretable Model-agnostic Explanations) assessed model interpretability. Results showed SEM's superior interpretability over traditional models, such as Random Forest, Logistic Regression, Decision Tree, Gradient Boosting Machine, Extreme Gradient Boosting Machine, and Light Gradient Boosting Machine. SEM achieved an accuracy of 98.90 %, precision of 98.70 %, F1 score of 98.85 %, sensitivity of 98.77 %, specificity of 95.45 %, Cohen’s kappa value of 94.56 %, and an AUC of 98.10 %. The SEM demonstrated robust performance in lung cancer prognosis, revealing chronic lung cancer and genetic risk as major factors.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

肺癌预后的高效可解释堆积集合模型

肺癌在全球癌症死亡率中占很大比例，给临床治疗带来了挑战。早期发现和准确预后对改善患者预后至关重要。本研究为肺癌预后预测开发了一种可解释的堆叠集合模型（SEM），并确定了关键风险因素。该模型使用包含 22 个变量的 1000 名患者的 Kaggle 数据集，将预后分为低、中和高风险类别。评估指标采用了自举法，而 SHAP（夏普利相加解释）和 LIME（本地可解释模型-不可知论解释）评估了模型的可解释性。结果显示，SEM 的可解释性优于随机森林、逻辑回归、决策树、梯度提升机、极梯度提升机和轻梯度提升机等传统模型。SEM 的准确度为 98.90 %，精确度为 98.70 %，F1 分数为 98.85 %，灵敏度为 98.77 %，特异度为 95.45 %，Cohen's kappa 值为 94.56 %，AUC 为 98.10 %。SEM 在肺癌预后方面表现出色，显示慢性肺癌和遗传风险是主要因素。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computational Biology and Chemistry 生物-计算机：跨学科应用

CiteScore

6.10

自引率

3.20%

发文量

142

审稿时长

24 days

期刊介绍： Computational Biology and Chemistry publishes original research papers and review articles in all areas of computational life sciences. High quality research contributions with a major computational component in the areas of nucleic acid and protein sequence research, molecular evolution, molecular genetics (functional genomics and proteomics), theory and practice of either biology-specific or chemical-biology-specific modeling, and structural biology of nucleic acids and proteins are particularly welcome. Exceptionally high quality research work in bioinformatics, systems biology, ecology, computational pharmacology, metabolism, biomedical engineering, epidemiology, and statistical genetics will also be considered. Given their inherent uncertainty, protein modeling and molecular docking studies should be thoroughly validated. In the absence of experimental results for validation, the use of molecular dynamics simulations along with detailed free energy calculations, for example, should be used as complementary techniques to support the major conclusions. Submissions of premature modeling exercises without additional biological insights will not be considered. Review articles will generally be commissioned by the editors and should not be submitted to the journal without explicit invitation. However prospective authors are welcome to send a brief (one to three pages) synopsis, which will be evaluated by the editors.