A Machine Learning Risk Prediction Model for Gastric Cancer with SHapley Additive exPlanations.

IF 4.1 2区 医学 Q2 ONCOLOGY Cancer Research and Treatment Pub Date : 2024-12-16 DOI:10.4143/crt.2024.843
Bomi Park, Chung Ho Kim, Jae Kwan Jun, Mina Suh, Kui Son Choi, Il Ju Choi, Hyun Jin Oh
{"title":"A Machine Learning Risk Prediction Model for Gastric Cancer with SHapley Additive exPlanations.","authors":"Bomi Park, Chung Ho Kim, Jae Kwan Jun, Mina Suh, Kui Son Choi, Il Ju Choi, Hyun Jin Oh","doi":"10.4143/crt.2024.843","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Gastric cancer (GC) prediction models hold potential for enhancing early detection by enabling the identification of high-risk individuals, facilitating personalized risk-based screening, and optimizing the allocation of healthcare resources.</p><p><strong>Materials and methods: </strong>In this study, we developed a machine learning-based GC prediction model utilizing data from the Korean National Health Insurance Service, encompassing 10,515,949 adults who had not been diagnosed with GC and underwent GC screening during 2013-2014, with a follow-up period of at least five years. The cohort was divided into training and test datasets at an 8:2 ratio, and class imbalance was mitigated through random oversampling.</p><p><strong>Results: </strong>Among various models, logistic regression demonstrated the highest predictive performance, with an area under the receiver operating characteristic curve (AUC) of 0.708, which was consistent with the AUC obtained in external validation (0.669). Importantly, the outcomes were robust to missing data imputation and variable selection. The SHapley Additive exPlanations (SHAP) algorithm enhanced the explainability of the model, identifying advancing age, being male, Helicobacter pylori infection, current smoking, and a family history of GC as key predictors of elevated risk.</p><p><strong>Conclusion: </strong>This predictive model could significantly contribute to the early identification of individuals at elevated risk for gastric cancer, thereby enabling the implementation of targeted preventive strategies. Furthermore, the integration of noninvasive and cost-effective predictors enhances the clinical utility of the model, supporting its potential application in routine healthcare settings.</p>","PeriodicalId":49094,"journal":{"name":"Cancer Research and Treatment","volume":" ","pages":""},"PeriodicalIF":4.1000,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Research and Treatment","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.4143/crt.2024.843","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: Gastric cancer (GC) prediction models hold potential for enhancing early detection by enabling the identification of high-risk individuals, facilitating personalized risk-based screening, and optimizing the allocation of healthcare resources.

Materials and methods: In this study, we developed a machine learning-based GC prediction model utilizing data from the Korean National Health Insurance Service, encompassing 10,515,949 adults who had not been diagnosed with GC and underwent GC screening during 2013-2014, with a follow-up period of at least five years. The cohort was divided into training and test datasets at an 8:2 ratio, and class imbalance was mitigated through random oversampling.

Results: Among various models, logistic regression demonstrated the highest predictive performance, with an area under the receiver operating characteristic curve (AUC) of 0.708, which was consistent with the AUC obtained in external validation (0.669). Importantly, the outcomes were robust to missing data imputation and variable selection. The SHapley Additive exPlanations (SHAP) algorithm enhanced the explainability of the model, identifying advancing age, being male, Helicobacter pylori infection, current smoking, and a family history of GC as key predictors of elevated risk.

Conclusion: This predictive model could significantly contribute to the early identification of individuals at elevated risk for gastric cancer, thereby enabling the implementation of targeted preventive strategies. Furthermore, the integration of noninvasive and cost-effective predictors enhances the clinical utility of the model, supporting its potential application in routine healthcare settings.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于SHapley加性解释的胃癌机器学习风险预测模型。
目的:胃癌(GC)预测模型通过识别高风险个体、促进个性化基于风险的筛查和优化医疗资源分配,具有增强早期发现的潜力。材料和方法:在这项研究中,我们开发了一个基于机器学习的GC预测模型,利用来自韩国国民健康保险服务的数据,包括10,515,949名未被诊断为GC并在2013-2014年间进行了GC筛查的成年人,随访期至少为5年。以8:2的比例将队列分为训练数据集和测试数据集,并通过随机过抽样缓解类不平衡。结果:在各模型中,logistic回归的预测效果最好,其受试者工作特征曲线下面积(AUC)为0.708,与外部验证的AUC(0.669)一致。重要的是,对于缺失的数据输入和变量选择,结果是稳健的。SHapley加性解释(SHAP)算法增强了模型的可解释性,将年龄增长、男性、幽门螺杆菌感染、当前吸烟和家族史作为风险升高的关键预测因素。结论:该预测模型有助于早期识别胃癌高危人群,从而实施有针对性的预防策略。此外,非侵入性和成本效益预测的整合增强了该模型的临床效用,支持其在常规医疗保健环境中的潜在应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
8.00
自引率
2.20%
发文量
126
审稿时长
>12 weeks
期刊介绍: Cancer Research and Treatment is a peer-reviewed open access publication of the Korean Cancer Association. It is published quarterly, one volume per year. Abbreviated title is Cancer Res Treat. It accepts manuscripts relevant to experimental and clinical cancer research. Subjects include carcinogenesis, tumor biology, molecular oncology, cancer genetics, tumor immunology, epidemiology, predictive markers and cancer prevention, pathology, cancer diagnosis, screening and therapies including chemotherapy, surgery, radiation therapy, immunotherapy, gene therapy, multimodality treatment and palliative care.
期刊最新文献
ALYREF-Mediated Regulation of TBL1XR1 and KMT2E Synergistically Upregulates APOC1, Contributing to Oxaliplatin Resistance in Esophageal Cancer. Differential Efficacy of Alpelisib by PIK3CA Mutation Site in Head and Neck Squamous Cell Carcinoma: An Analysis from the KCSG HN 15-16 TRIUMPH Trial. Dose-response Association Between Alcohol Consumption and Kidney Cancer Risk Differs According to Glycemic Status: A Nationwide Cohort Study of 9.4 Million Individuals. Novel Breast Cancer Risk Assessment Tools for Pre- and Post-Menopausal Asian Women: Development and Validation in a Nationwide Mammographic Screening Cohort. Literature-Guided 6-Gene Signature for the Stratification of High-Risk Acute Myeloid Leukemia.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1