{"title":"Machine learning predictive models and risk factors for lymph node metastasis in non-small cell lung cancer.","authors":"Bo Wu, Yihui Zhu, Zhuozheng Hu, Jiajun Wu, Weijun Zhou, Maoyan Si, Xiying Cao, Zhicheng Wu, Wenxiong Zhang","doi":"10.1186/s12890-024-03345-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The prognosis of non-small cell lung cancer (NSCLC) is substantially affected by lymph node metastasis (LNM), but there are no noninvasive, inexpensive methods of relatively high accuracy available to predict LNM in NSCLC patients.</p><p><strong>Methods: </strong>Clinical data on NSCLC patients were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. Risk factors for LNM were recognized LASSO and multivariate logistic regression. Six predictive models were constructed with machine learning based on risk factors. The area under the receiver operating characteristic curve (AUC) was used to assess the performance of the model. Subgroup analysis with different T-stages was performed on an optimal model. A webpage LNM risk calculator for optimal model was built using the Shinyapps.io platform.</p><p><strong>Results: </strong>We enrolled 64,012 NSCLC patients, of whom 26,611 (41.57%) had LNM. Using multivariate logistic regression, we finally identified 10 independent risk factors for LNM: age, sex, race, histology, primary site, grade, T stage, M stage, tumor size, and bone metastases. GLM is the optimal model among all six machine learning models in both the training and validation cohorts. Subgroup analyses revealed that GLM has good predictability for populations with different T staging. A webpage LNM risk calculator based on GLM was posted on the shinyapps.io platform ( https://wubopredict.shinyapps.io/dynnomapp/ ).</p><p><strong>Conclusion: </strong>The predictive model based on GLM can be used to precisely predict the probability of LNM in NSCLC patients, which was proven effective in all subgroup analyses according to T staging.</p>","PeriodicalId":9148,"journal":{"name":"BMC Pulmonary Medicine","volume":"24 1","pages":"526"},"PeriodicalIF":2.6000,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11515794/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Pulmonary Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12890-024-03345-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RESPIRATORY SYSTEM","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The prognosis of non-small cell lung cancer (NSCLC) is substantially affected by lymph node metastasis (LNM), but there are no noninvasive, inexpensive methods of relatively high accuracy available to predict LNM in NSCLC patients.
Methods: Clinical data on NSCLC patients were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. Risk factors for LNM were recognized LASSO and multivariate logistic regression. Six predictive models were constructed with machine learning based on risk factors. The area under the receiver operating characteristic curve (AUC) was used to assess the performance of the model. Subgroup analysis with different T-stages was performed on an optimal model. A webpage LNM risk calculator for optimal model was built using the Shinyapps.io platform.
Results: We enrolled 64,012 NSCLC patients, of whom 26,611 (41.57%) had LNM. Using multivariate logistic regression, we finally identified 10 independent risk factors for LNM: age, sex, race, histology, primary site, grade, T stage, M stage, tumor size, and bone metastases. GLM is the optimal model among all six machine learning models in both the training and validation cohorts. Subgroup analyses revealed that GLM has good predictability for populations with different T staging. A webpage LNM risk calculator based on GLM was posted on the shinyapps.io platform ( https://wubopredict.shinyapps.io/dynnomapp/ ).
Conclusion: The predictive model based on GLM can be used to precisely predict the probability of LNM in NSCLC patients, which was proven effective in all subgroup analyses according to T staging.
期刊介绍:
BMC Pulmonary Medicine is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of pulmonary and associated disorders, as well as related molecular genetics, pathophysiology, and epidemiology.