{"title":"非小细胞肺癌淋巴结转移的机器学习预测模型和风险因素。","authors":"Bo Wu, Yihui Zhu, Zhuozheng Hu, Jiajun Wu, Weijun Zhou, Maoyan Si, Xiying Cao, Zhicheng Wu, Wenxiong Zhang","doi":"10.1186/s12890-024-03345-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The prognosis of non-small cell lung cancer (NSCLC) is substantially affected by lymph node metastasis (LNM), but there are no noninvasive, inexpensive methods of relatively high accuracy available to predict LNM in NSCLC patients.</p><p><strong>Methods: </strong>Clinical data on NSCLC patients were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. Risk factors for LNM were recognized LASSO and multivariate logistic regression. Six predictive models were constructed with machine learning based on risk factors. The area under the receiver operating characteristic curve (AUC) was used to assess the performance of the model. Subgroup analysis with different T-stages was performed on an optimal model. A webpage LNM risk calculator for optimal model was built using the Shinyapps.io platform.</p><p><strong>Results: </strong>We enrolled 64,012 NSCLC patients, of whom 26,611 (41.57%) had LNM. Using multivariate logistic regression, we finally identified 10 independent risk factors for LNM: age, sex, race, histology, primary site, grade, T stage, M stage, tumor size, and bone metastases. GLM is the optimal model among all six machine learning models in both the training and validation cohorts. Subgroup analyses revealed that GLM has good predictability for populations with different T staging. A webpage LNM risk calculator based on GLM was posted on the shinyapps.io platform ( https://wubopredict.shinyapps.io/dynnomapp/ ).</p><p><strong>Conclusion: </strong>The predictive model based on GLM can be used to precisely predict the probability of LNM in NSCLC patients, which was proven effective in all subgroup analyses according to T staging.</p>","PeriodicalId":9148,"journal":{"name":"BMC Pulmonary Medicine","volume":"24 1","pages":"526"},"PeriodicalIF":2.6000,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11515794/pdf/","citationCount":"0","resultStr":"{\"title\":\"Machine learning predictive models and risk factors for lymph node metastasis in non-small cell lung cancer.\",\"authors\":\"Bo Wu, Yihui Zhu, Zhuozheng Hu, Jiajun Wu, Weijun Zhou, Maoyan Si, Xiying Cao, Zhicheng Wu, Wenxiong Zhang\",\"doi\":\"10.1186/s12890-024-03345-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The prognosis of non-small cell lung cancer (NSCLC) is substantially affected by lymph node metastasis (LNM), but there are no noninvasive, inexpensive methods of relatively high accuracy available to predict LNM in NSCLC patients.</p><p><strong>Methods: </strong>Clinical data on NSCLC patients were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. Risk factors for LNM were recognized LASSO and multivariate logistic regression. Six predictive models were constructed with machine learning based on risk factors. The area under the receiver operating characteristic curve (AUC) was used to assess the performance of the model. Subgroup analysis with different T-stages was performed on an optimal model. A webpage LNM risk calculator for optimal model was built using the Shinyapps.io platform.</p><p><strong>Results: </strong>We enrolled 64,012 NSCLC patients, of whom 26,611 (41.57%) had LNM. Using multivariate logistic regression, we finally identified 10 independent risk factors for LNM: age, sex, race, histology, primary site, grade, T stage, M stage, tumor size, and bone metastases. GLM is the optimal model among all six machine learning models in both the training and validation cohorts. Subgroup analyses revealed that GLM has good predictability for populations with different T staging. A webpage LNM risk calculator based on GLM was posted on the shinyapps.io platform ( https://wubopredict.shinyapps.io/dynnomapp/ ).</p><p><strong>Conclusion: </strong>The predictive model based on GLM can be used to precisely predict the probability of LNM in NSCLC patients, which was proven effective in all subgroup analyses according to T staging.</p>\",\"PeriodicalId\":9148,\"journal\":{\"name\":\"BMC Pulmonary Medicine\",\"volume\":\"24 1\",\"pages\":\"526\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-10-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11515794/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Pulmonary Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12890-024-03345-7\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"RESPIRATORY SYSTEM\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Pulmonary Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12890-024-03345-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RESPIRATORY SYSTEM","Score":null,"Total":0}
引用次数: 0
摘要
背景:非小细胞肺癌(NSCLC)的预后受到淋巴结转移(LNM)的严重影响,但目前还没有准确性相对较高的无创、廉价方法来预测 NSCLC 患者的淋巴结转移:方法:从监测、流行病学和最终结果(SEER)数据库中获取 NSCLC 患者的临床数据。对LNM的风险因素进行了LASSO和多变量逻辑回归识别。根据风险因素通过机器学习构建了六个预测模型。接受者操作特征曲线下面积(AUC)用于评估模型的性能。根据最佳模型对不同 T 分期进行了分组分析。我们使用 Shinyapps.io 平台建立了一个网页版 LNM 风险计算器:我们招募了64012名NSCLC患者,其中26611人(41.57%)患有LNM。通过多变量逻辑回归,我们最终确定了LNM的10个独立风险因素:年龄、性别、种族、组织学、原发部位、分级、T期、M期、肿瘤大小和骨转移。在训练组和验证组中,GLM 是所有六个机器学习模型中的最佳模型。亚组分析显示,GLM 对不同 T 分期的人群具有良好的预测能力。基于GLM的LNM风险计算器网页已发布在shinyapps.io平台上( https://wubopredict.shinyapps.io/dynnomapp/ )。结论:结论:基于GLM的预测模型可用于精确预测NSCLC患者发生LNM的概率,在根据T分期进行的所有亚组分析中均被证明有效。
Machine learning predictive models and risk factors for lymph node metastasis in non-small cell lung cancer.
Background: The prognosis of non-small cell lung cancer (NSCLC) is substantially affected by lymph node metastasis (LNM), but there are no noninvasive, inexpensive methods of relatively high accuracy available to predict LNM in NSCLC patients.
Methods: Clinical data on NSCLC patients were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. Risk factors for LNM were recognized LASSO and multivariate logistic regression. Six predictive models were constructed with machine learning based on risk factors. The area under the receiver operating characteristic curve (AUC) was used to assess the performance of the model. Subgroup analysis with different T-stages was performed on an optimal model. A webpage LNM risk calculator for optimal model was built using the Shinyapps.io platform.
Results: We enrolled 64,012 NSCLC patients, of whom 26,611 (41.57%) had LNM. Using multivariate logistic regression, we finally identified 10 independent risk factors for LNM: age, sex, race, histology, primary site, grade, T stage, M stage, tumor size, and bone metastases. GLM is the optimal model among all six machine learning models in both the training and validation cohorts. Subgroup analyses revealed that GLM has good predictability for populations with different T staging. A webpage LNM risk calculator based on GLM was posted on the shinyapps.io platform ( https://wubopredict.shinyapps.io/dynnomapp/ ).
Conclusion: The predictive model based on GLM can be used to precisely predict the probability of LNM in NSCLC patients, which was proven effective in all subgroup analyses according to T staging.
期刊介绍:
BMC Pulmonary Medicine is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of pulmonary and associated disorders, as well as related molecular genetics, pathophysiology, and epidemiology.