Individual risk and prognostic value prediction by interpretable machine learning for distant metastasis in neuroblastoma: A population-based study and an external validation
Shan Li , Jinkui Wang , Zhaoxia Zhang , Chunnian Ren , Dawei He
{"title":"Individual risk and prognostic value prediction by interpretable machine learning for distant metastasis in neuroblastoma: A population-based study and an external validation","authors":"Shan Li , Jinkui Wang , Zhaoxia Zhang , Chunnian Ren , Dawei He","doi":"10.1016/j.ijmedinf.2025.105813","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>Neuroblastoma (NB) is a childhood malignancy with a poor prognosis and a propensity for distant metastasis (DM). We aimed to establish machine learning (ML) based model to accurately predict risk of DM and prognosis of NB patients with DM.</div></div><div><h3>Methods</h3><div>We analyzed NB patients from the Surveillance, Epidemiology, and End Results (SEER) database between 2000 and 2020. Univariate and multivariate logistic analysis were employed to select meaning variables. Recursive Feature Elimination (RFE) method based on 6 ML algorithms was utilized in feature selection. To construct predictive model, 13 ML algorithms were evaluated by area under the operating characteristic curve (AUC), accuracy, sensitivity, specificity, precision, cross-entropy, Brier scores, Balanced Accuracy and F-beta score. An optimal ML model was constructed to predict DM, and the predictive results were explained by SHapley Additive exPlanations (SHAP) framework. Meanwhile, 101 ML algorithm combinations were developed to select the best model with highest C-index to predict prognosis of NB patients with DM.</div></div><div><h3>Results</h3><div>A total of 1,668 NB patients from SEER database was consecutively enrolled. We identified that tumor primary site, grade, surgery type, regional lymph nodes, radiotherapy and chemotherapy are significant risk factors for DM. CatBoost model was selected as the best prediction model, and AUC was 0.846 (95 %CI: [0.804,0.899]), 0.834 (95 %CI: [0.796,0.873]) and 0.813 (95 %CI: [0.776,0.852]) in training, internal test and external test sets, with 0.777 accuracy, 0.839 sensitivity, 0.72 specificity and 0.731 precision in training set. Grade, chemotherapy and radiotherapy had the greatest effects on DM according to SHAP results. For prognosis prediction, “RSF + GBM” algorithm was the best prognostic model with C-index of 0.656, 0.611 and 0.629 in training, internal test and external test sets.</div></div><div><h3>Conclusions</h3><div>Our ML models demonstrate excellent accuracy and reliability, offering more precise personalized metastasis diagnosis and prognostic prediction to NB patients.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"196 ","pages":"Article 105813"},"PeriodicalIF":3.7000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505625000309","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
Neuroblastoma (NB) is a childhood malignancy with a poor prognosis and a propensity for distant metastasis (DM). We aimed to establish machine learning (ML) based model to accurately predict risk of DM and prognosis of NB patients with DM.
Methods
We analyzed NB patients from the Surveillance, Epidemiology, and End Results (SEER) database between 2000 and 2020. Univariate and multivariate logistic analysis were employed to select meaning variables. Recursive Feature Elimination (RFE) method based on 6 ML algorithms was utilized in feature selection. To construct predictive model, 13 ML algorithms were evaluated by area under the operating characteristic curve (AUC), accuracy, sensitivity, specificity, precision, cross-entropy, Brier scores, Balanced Accuracy and F-beta score. An optimal ML model was constructed to predict DM, and the predictive results were explained by SHapley Additive exPlanations (SHAP) framework. Meanwhile, 101 ML algorithm combinations were developed to select the best model with highest C-index to predict prognosis of NB patients with DM.
Results
A total of 1,668 NB patients from SEER database was consecutively enrolled. We identified that tumor primary site, grade, surgery type, regional lymph nodes, radiotherapy and chemotherapy are significant risk factors for DM. CatBoost model was selected as the best prediction model, and AUC was 0.846 (95 %CI: [0.804,0.899]), 0.834 (95 %CI: [0.796,0.873]) and 0.813 (95 %CI: [0.776,0.852]) in training, internal test and external test sets, with 0.777 accuracy, 0.839 sensitivity, 0.72 specificity and 0.731 precision in training set. Grade, chemotherapy and radiotherapy had the greatest effects on DM according to SHAP results. For prognosis prediction, “RSF + GBM” algorithm was the best prognostic model with C-index of 0.656, 0.611 and 0.629 in training, internal test and external test sets.
Conclusions
Our ML models demonstrate excellent accuracy and reliability, offering more precise personalized metastasis diagnosis and prognostic prediction to NB patients.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.