使用机器学习模型预测肺癌患者术后肺功能。

IF 2.5 Q2 RESPIRATORY SYSTEM Tuberculosis and Respiratory Diseases Pub Date : 2023-07-01 DOI:10.4046/trd.2022.0048

Oh Beom Kwon, Solji Han, Hwa Young Lee, Hye Seon Kang, Sung Kyoung Kim, Ju Sang Kim, Chan Kwon Park, Sang Haak Lee, Seung Joon Kim, Jin Woo Kim, Chang Dong Yeo

{"title":"使用机器学习模型预测肺癌患者术后肺功能。","authors":"Oh Beom Kwon, Solji Han, Hwa Young Lee, Hye Seon Kang, Sung Kyoung Kim, Ju Sang Kim, Chan Kwon Park, Sang Haak Lee, Seung Joon Kim, Jin Woo Kim, Chang Dong Yeo","doi":"10.4046/trd.2022.0048","DOIUrl":null,"url":null,"abstract":"Background: Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models.Methods: We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets.Results: A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07.Conclusion: The LightGBM model showed the best performance in predicting postoperative lung function.","PeriodicalId":23368,"journal":{"name":"Tuberculosis and Respiratory Diseases","volume":"86 3","pages":"203-215"},"PeriodicalIF":2.5000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/85/a1/trd-2022-0048.PMC10323210.pdf","citationCount":"0","resultStr":"{\"title\":\"Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models.\",\"authors\":\"Oh Beom Kwon, Solji Han, Hwa Young Lee, Hye Seon Kang, Sung Kyoung Kim, Ju Sang Kim, Chan Kwon Park, Sang Haak Lee, Seung Joon Kim, Jin Woo Kim, Chang Dong Yeo\",\"doi\":\"10.4046/trd.2022.0048\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models.Methods: We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets.Results: A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07.Conclusion: The LightGBM model showed the best performance in predicting postoperative lung function.\",\"PeriodicalId\":23368,\"journal\":{\"name\":\"Tuberculosis and Respiratory Diseases\",\"volume\":\"86 3\",\"pages\":\"203-215\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2023-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/85/a1/trd-2022-0048.PMC10323210.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Tuberculosis and Respiratory Diseases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4046/trd.2022.0048\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"RESPIRATORY SYSTEM\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tuberculosis and Respiratory Diseases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4046/trd.2022.0048","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RESPIRATORY SYSTEM","Score":null,"Total":0}

引用次数: 0

摘要

背景:手术切除是早期肺癌的标准治疗方法。由于术后肺功能与死亡率相关，因此使用预测的术后肺功能来确定治疗方式。本研究的目的是评估线性回归和机器学习模型的预测性能。方法:从临床数据仓库中提取数据，建立三组:一组为线性回归模型;集合II，省略缺失数据的机器学习模型;集合III，输入缺失数据的机器学习模型。实现了最小绝对收缩和选择算子(LASSO)、Ridge回归、ElasticNet、随机森林、极限梯度增强(XGBoost)和光梯度增强机(LightGBM) 6种机器学习模型。以术后6个月1秒用力呼气量为观察指标。对机器学习模型的超参数调优进行了五重交叉验证。数据集以70:30的比例分为训练数据集和测试数据集。在集III中进行数据集分割后实现。采用R2和均方误差(MSE)对三组的预测性能进行评价。结果:第一组和第三组共纳入1487例患者，第二组共纳入896例患者。在set I中，R2值为0.27，在set II中，LightGBM是最佳模型，R2值最高为0.5,MSE最低为154.95。在第三组中，LightGBM是最佳模型，R2最高为0.56,MSE最低为174.07。结论:LightGBM模型预测术后肺功能的效果最好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models.

Background: Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models.

Methods: We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets.

Results: A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07.

Conclusion: The LightGBM model showed the best performance in predicting postoperative lung function.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊