{"title":"Development of PDAC diagnosis and prognosis evaluation models based on machine learning.","authors":"Yingqi Xiao, Shixin Sun, Naxin Zheng, Jing Zhao, Xiaohan Li, Jianmin Xu, Haolian Li, Chenran Du, Lijun Zeng, Juling Zhang, Xiuyun Yin, Yuan Huang, Xuemei Yang, Fang Yuan, Xingwang Jia, Boan Li, Bo Li","doi":"10.1186/s12885-025-13929-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Pancreatic ductal adenocarcinoma (PDAC) is difficult to detect early and highly aggressive, often leading to poor patient prognosis. Existing serum biomarkers like CA19-9 are limited in early diagnosis, failing to meet clinical needs. Machine learning (ML)/deep learning (DL) technologies have shown great potential in biomedicine. This study aims to establish PDAC differential diagnosis and prognosis assessment models using ML combined with serum biomarkers for early diagnosis, risk stratification, and personalized treatment recommendations, improving early diagnosis rates and patient survival.</p><p><strong>Methods: </strong>The study included serum biomarker data and prognosis information from 117 PDAC patients. ML models (Random Forest (RF), Neural Network (NNET), Support Vector Machine (SVM), and Gradient Boosting Machine (GBM)) were used for differential diagnosis, evaluated by accuracy, Kappa test, ROC curve, sensitivity, and specificity. COX proportional hazards model and DeepSurv DL model predicted survival risk, compared by C-index and Log-rank test. Based on DeepSurv's risk predictions, personalized treatment recommendations were made and their effectiveness assessed.</p><p><strong>Results: </strong>Effective PDAC diagnosis and prognosis models were built using ML. The validation set data shows that the accuracy of the RF, NNET, SVM, and GBM models are 84.21%, 84.21%, 76.97%, and 83.55%; the sensitivity are 91.26%, 90.29%, 89.32%, and 88.35%; and the specificity are 69.39%, 71.43%, 51.02%, and 73.47%. The Kappa values are 0.6266, 0.6307, 0.4336, and 0.6215; and the AUC are 0.889, 0.8488, 0.8488, and 0.8704, respectively. BCAT1, AMY, and CA12-5 were selected as modeling parameters for the prognosis model using COX regression. DeepSurv outperformed the COX model on both training and validation sets, with C-indexes of 0.738 and 0.724, respectively. The Kaplan-Meier survival curves indicate that personalized treatment recommendations based on DeepSurv can help patients achieve survival benefits.</p><p><strong>Conclusion: </strong>This study built efficient PDAC diagnosis and prognosis models using ML, improving early diagnosis rates and prognosis accuracy. The DeepSurv model excelled in prognosis prediction and successfully guided personalized treatment recommendations and supporting PDAC clinical management.</p>","PeriodicalId":9131,"journal":{"name":"BMC Cancer","volume":"25 1","pages":"512"},"PeriodicalIF":3.4000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Cancer","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12885-025-13929-z","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Pancreatic ductal adenocarcinoma (PDAC) is difficult to detect early and highly aggressive, often leading to poor patient prognosis. Existing serum biomarkers like CA19-9 are limited in early diagnosis, failing to meet clinical needs. Machine learning (ML)/deep learning (DL) technologies have shown great potential in biomedicine. This study aims to establish PDAC differential diagnosis and prognosis assessment models using ML combined with serum biomarkers for early diagnosis, risk stratification, and personalized treatment recommendations, improving early diagnosis rates and patient survival.
Methods: The study included serum biomarker data and prognosis information from 117 PDAC patients. ML models (Random Forest (RF), Neural Network (NNET), Support Vector Machine (SVM), and Gradient Boosting Machine (GBM)) were used for differential diagnosis, evaluated by accuracy, Kappa test, ROC curve, sensitivity, and specificity. COX proportional hazards model and DeepSurv DL model predicted survival risk, compared by C-index and Log-rank test. Based on DeepSurv's risk predictions, personalized treatment recommendations were made and their effectiveness assessed.
Results: Effective PDAC diagnosis and prognosis models were built using ML. The validation set data shows that the accuracy of the RF, NNET, SVM, and GBM models are 84.21%, 84.21%, 76.97%, and 83.55%; the sensitivity are 91.26%, 90.29%, 89.32%, and 88.35%; and the specificity are 69.39%, 71.43%, 51.02%, and 73.47%. The Kappa values are 0.6266, 0.6307, 0.4336, and 0.6215; and the AUC are 0.889, 0.8488, 0.8488, and 0.8704, respectively. BCAT1, AMY, and CA12-5 were selected as modeling parameters for the prognosis model using COX regression. DeepSurv outperformed the COX model on both training and validation sets, with C-indexes of 0.738 and 0.724, respectively. The Kaplan-Meier survival curves indicate that personalized treatment recommendations based on DeepSurv can help patients achieve survival benefits.
Conclusion: This study built efficient PDAC diagnosis and prognosis models using ML, improving early diagnosis rates and prognosis accuracy. The DeepSurv model excelled in prognosis prediction and successfully guided personalized treatment recommendations and supporting PDAC clinical management.
期刊介绍:
BMC Cancer is an open access, peer-reviewed journal that considers articles on all aspects of cancer research, including the pathophysiology, prevention, diagnosis and treatment of cancers. The journal welcomes submissions concerning molecular and cellular biology, genetics, epidemiology, and clinical trials.