Bomi Park, Chung Ho Kim, Jae Kwan Jun, Mina Suh, Kui Son Choi, Il Ju Choi, Hyun Jin Oh
{"title":"A Machine Learning Risk Prediction Model for Gastric Cancer with SHapley Additive exPlanations.","authors":"Bomi Park, Chung Ho Kim, Jae Kwan Jun, Mina Suh, Kui Son Choi, Il Ju Choi, Hyun Jin Oh","doi":"10.4143/crt.2024.843","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Gastric cancer (GC) prediction models hold potential for enhancing early detection by enabling the identification of high-risk individuals, facilitating personalized risk-based screening, and optimizing the allocation of healthcare resources.</p><p><strong>Materials and methods: </strong>In this study, we developed a machine learning-based GC prediction model utilizing data from the Korean National Health Insurance Service, encompassing 10,515,949 adults who had not been diagnosed with GC and underwent GC screening during 2013-2014, with a follow-up period of at least five years. The cohort was divided into training and test datasets at an 8:2 ratio, and class imbalance was mitigated through random oversampling.</p><p><strong>Results: </strong>Among various models, logistic regression demonstrated the highest predictive performance, with an area under the receiver operating characteristic curve (AUC) of 0.708, which was consistent with the AUC obtained in external validation (0.669). Importantly, the outcomes were robust to missing data imputation and variable selection. The SHapley Additive exPlanations (SHAP) algorithm enhanced the explainability of the model, identifying advancing age, being male, Helicobacter pylori infection, current smoking, and a family history of GC as key predictors of elevated risk.</p><p><strong>Conclusion: </strong>This predictive model could significantly contribute to the early identification of individuals at elevated risk for gastric cancer, thereby enabling the implementation of targeted preventive strategies. Furthermore, the integration of noninvasive and cost-effective predictors enhances the clinical utility of the model, supporting its potential application in routine healthcare settings.</p>","PeriodicalId":49094,"journal":{"name":"Cancer Research and Treatment","volume":" ","pages":""},"PeriodicalIF":4.1000,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Research and Treatment","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.4143/crt.2024.843","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Gastric cancer (GC) prediction models hold potential for enhancing early detection by enabling the identification of high-risk individuals, facilitating personalized risk-based screening, and optimizing the allocation of healthcare resources.
Materials and methods: In this study, we developed a machine learning-based GC prediction model utilizing data from the Korean National Health Insurance Service, encompassing 10,515,949 adults who had not been diagnosed with GC and underwent GC screening during 2013-2014, with a follow-up period of at least five years. The cohort was divided into training and test datasets at an 8:2 ratio, and class imbalance was mitigated through random oversampling.
Results: Among various models, logistic regression demonstrated the highest predictive performance, with an area under the receiver operating characteristic curve (AUC) of 0.708, which was consistent with the AUC obtained in external validation (0.669). Importantly, the outcomes were robust to missing data imputation and variable selection. The SHapley Additive exPlanations (SHAP) algorithm enhanced the explainability of the model, identifying advancing age, being male, Helicobacter pylori infection, current smoking, and a family history of GC as key predictors of elevated risk.
Conclusion: This predictive model could significantly contribute to the early identification of individuals at elevated risk for gastric cancer, thereby enabling the implementation of targeted preventive strategies. Furthermore, the integration of noninvasive and cost-effective predictors enhances the clinical utility of the model, supporting its potential application in routine healthcare settings.
期刊介绍:
Cancer Research and Treatment is a peer-reviewed open access publication of the Korean Cancer Association. It is published quarterly, one volume per year. Abbreviated title is Cancer Res Treat. It accepts manuscripts relevant to experimental and clinical cancer research. Subjects include carcinogenesis, tumor biology, molecular oncology, cancer genetics, tumor immunology, epidemiology, predictive markers and cancer prevention, pathology, cancer diagnosis, screening and therapies including chemotherapy, surgery, radiation therapy, immunotherapy, gene therapy, multimodality treatment and palliative care.