A. Boueiz, Zhonghui Xu, Yale Chang, A. Masoomi, A. Gregory, S. Lutz, D. Qiao, J. Crapo, J. Dy, E. Silverman, P. Castaldi
{"title":"Machine Learning Prediction of Progression in Forced Expiratory Volume in 1 Second in the COPDGene® Study.","authors":"A. Boueiz, Zhonghui Xu, Yale Chang, A. Masoomi, A. Gregory, S. Lutz, D. Qiao, J. Crapo, J. Dy, E. Silverman, P. Castaldi","doi":"10.15326/jcopdf.2021.0275","DOIUrl":null,"url":null,"abstract":"Background\nThe heterogeneous nature of COPD complicates the identification of the predictors of disease progression. We aimed to improve the prediction of disease progression in COPD by using machine learning and incorporating a rich dataset of phenotypic features.\n\n\nMethods\nWe included 4,496 smokers with available data from their enrollment and 5-year follow-up visits in the Genetic Epidemiology of COPD (COPDGene) study. We constructed linear regression (LR) and supervised random forest (RF) models to predict 5-year progression in FEV1 from 46 baseline features. Using cross-validation, we randomly partitioned participants into training and testing samples. We also validated the results in the COPDGene 10-year follow-up visit.\n\n\nResults\nPredicting the change in FEV1 over time is more challenging than simply predicting the future absolute FEV1 level. For RF, R-squared was 0.15 and the area under the ROC curves for the prediction of subjects in the top quartile of observed progression was 0.71 (testing) and respectively, 0.10 and 0.70 (validation). RF provided slightly better performance than LR. The accuracy was best for GOLD1-2 subjects and it was harder to achieve accurate prediction in advanced stages of the disease. Predictive variables differed in their relative importance as well as for the predictions by GOLD.\n\n\nConclusion\nRF along with deep phenotyping predicts FEV1 progression with reasonable accuracy. There is significant room for improvement in future models. This prediction model facilitates the identification of smokers at increased risk for rapid disease progression. Such findings may be useful in the selection of patient populations for targeted clinical trials.","PeriodicalId":10249,"journal":{"name":"Chronic obstructive pulmonary diseases","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chronic obstructive pulmonary diseases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15326/jcopdf.2021.0275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Background
The heterogeneous nature of COPD complicates the identification of the predictors of disease progression. We aimed to improve the prediction of disease progression in COPD by using machine learning and incorporating a rich dataset of phenotypic features.
Methods
We included 4,496 smokers with available data from their enrollment and 5-year follow-up visits in the Genetic Epidemiology of COPD (COPDGene) study. We constructed linear regression (LR) and supervised random forest (RF) models to predict 5-year progression in FEV1 from 46 baseline features. Using cross-validation, we randomly partitioned participants into training and testing samples. We also validated the results in the COPDGene 10-year follow-up visit.
Results
Predicting the change in FEV1 over time is more challenging than simply predicting the future absolute FEV1 level. For RF, R-squared was 0.15 and the area under the ROC curves for the prediction of subjects in the top quartile of observed progression was 0.71 (testing) and respectively, 0.10 and 0.70 (validation). RF provided slightly better performance than LR. The accuracy was best for GOLD1-2 subjects and it was harder to achieve accurate prediction in advanced stages of the disease. Predictive variables differed in their relative importance as well as for the predictions by GOLD.
Conclusion
RF along with deep phenotyping predicts FEV1 progression with reasonable accuracy. There is significant room for improvement in future models. This prediction model facilitates the identification of smokers at increased risk for rapid disease progression. Such findings may be useful in the selection of patient populations for targeted clinical trials.