使用机器学习模型预测肺癌患者术后肺功能。

IF 2.5 Q2 RESPIRATORY SYSTEM Tuberculosis and Respiratory Diseases Pub Date : 2023-07-01 DOI:10.4046/trd.2022.0048
Oh Beom Kwon, Solji Han, Hwa Young Lee, Hye Seon Kang, Sung Kyoung Kim, Ju Sang Kim, Chan Kwon Park, Sang Haak Lee, Seung Joon Kim, Jin Woo Kim, Chang Dong Yeo
{"title":"使用机器学习模型预测肺癌患者术后肺功能。","authors":"Oh Beom Kwon,&nbsp;Solji Han,&nbsp;Hwa Young Lee,&nbsp;Hye Seon Kang,&nbsp;Sung Kyoung Kim,&nbsp;Ju Sang Kim,&nbsp;Chan Kwon Park,&nbsp;Sang Haak Lee,&nbsp;Seung Joon Kim,&nbsp;Jin Woo Kim,&nbsp;Chang Dong Yeo","doi":"10.4046/trd.2022.0048","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models.</p><p><strong>Methods: </strong>We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets.</p><p><strong>Results: </strong>A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07.</p><p><strong>Conclusion: </strong>The LightGBM model showed the best performance in predicting postoperative lung function.</p>","PeriodicalId":23368,"journal":{"name":"Tuberculosis and Respiratory Diseases","volume":"86 3","pages":"203-215"},"PeriodicalIF":2.5000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/85/a1/trd-2022-0048.PMC10323210.pdf","citationCount":"0","resultStr":"{\"title\":\"Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models.\",\"authors\":\"Oh Beom Kwon,&nbsp;Solji Han,&nbsp;Hwa Young Lee,&nbsp;Hye Seon Kang,&nbsp;Sung Kyoung Kim,&nbsp;Ju Sang Kim,&nbsp;Chan Kwon Park,&nbsp;Sang Haak Lee,&nbsp;Seung Joon Kim,&nbsp;Jin Woo Kim,&nbsp;Chang Dong Yeo\",\"doi\":\"10.4046/trd.2022.0048\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models.</p><p><strong>Methods: </strong>We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets.</p><p><strong>Results: </strong>A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07.</p><p><strong>Conclusion: </strong>The LightGBM model showed the best performance in predicting postoperative lung function.</p>\",\"PeriodicalId\":23368,\"journal\":{\"name\":\"Tuberculosis and Respiratory Diseases\",\"volume\":\"86 3\",\"pages\":\"203-215\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2023-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/85/a1/trd-2022-0048.PMC10323210.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Tuberculosis and Respiratory Diseases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4046/trd.2022.0048\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"RESPIRATORY SYSTEM\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tuberculosis and Respiratory Diseases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4046/trd.2022.0048","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RESPIRATORY SYSTEM","Score":null,"Total":0}
引用次数: 0

摘要

背景:手术切除是早期肺癌的标准治疗方法。由于术后肺功能与死亡率相关,因此使用预测的术后肺功能来确定治疗方式。本研究的目的是评估线性回归和机器学习模型的预测性能。方法:从临床数据仓库中提取数据,建立三组:一组为线性回归模型;集合II,省略缺失数据的机器学习模型;集合III,输入缺失数据的机器学习模型。实现了最小绝对收缩和选择算子(LASSO)、Ridge回归、ElasticNet、随机森林、极限梯度增强(XGBoost)和光梯度增强机(LightGBM) 6种机器学习模型。以术后6个月1秒用力呼气量为观察指标。对机器学习模型的超参数调优进行了五重交叉验证。数据集以70:30的比例分为训练数据集和测试数据集。在集III中进行数据集分割后实现。采用R2和均方误差(MSE)对三组的预测性能进行评价。结果:第一组和第三组共纳入1487例患者,第二组共纳入896例患者。在set I中,R2值为0.27,在set II中,LightGBM是最佳模型,R2值最高为0.5,MSE最低为154.95。在第三组中,LightGBM是最佳模型,R2最高为0.56,MSE最低为174.07。结论:LightGBM模型预测术后肺功能的效果最好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models.

Background: Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models.

Methods: We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets.

Results: A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07.

Conclusion: The LightGBM model showed the best performance in predicting postoperative lung function.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.30
自引率
0.00%
发文量
42
审稿时长
12 weeks
期刊最新文献
Features of Lung Cysts in Birt-Hogg-Dubé Syndrome from Patients with Multiple Lung Cysts. Request for Study Design Modification in Examining Nutritional Intake and Muscle Strength in Individuals with Airflow Limitation. Air Pollution and Interstitial Lung Disease. Dry Medical Thoracoscopy with Artificial Pneumothorax Induction Using Veress Needle. Clinical characteristics of chronic obstructive pulmonary disease according to smoking status.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1