Identification of Severe Acute Exacerbations of Chronic Obstructive Pulmonary Disease Subgroups by Machine Learning Implementation in Electronic Health Records.
Huan Li, John Huston, Jana Zielonka, Shannon Kay, Maor Sauler, Jose Gomez
{"title":"Identification of Severe Acute Exacerbations of Chronic Obstructive Pulmonary Disease Subgroups by Machine Learning Implementation in Electronic Health Records.","authors":"Huan Li, John Huston, Jana Zielonka, Shannon Kay, Maor Sauler, Jose Gomez","doi":"10.15326/jcopdf.2024.0556","DOIUrl":null,"url":null,"abstract":"<p><strong>Rationale: </strong>Acute exacerbations of COPD (AECOPD) are heterogeneous. Machine learning (ML) has previously been used to dissect some of the heterogeneity in COPD. The widespread adoption of electronic health records (EHRs) has led to the rapid accumulation of large amounts of patient data as part of routine clinical care. However, it is unclear whether the implementation of ML in EHR-derived data has the potential to identify subgroups of AECOPD.</p><p><strong>Objectives: </strong>Determine whether ML implementation using EHR data from severe AECOPD requiring hospitalization identifies relevant subgroups.</p><p><strong>Methods: </strong>This study used two retrospective cohorts of patients with AECOPD (non-COVID-19 and COVID-19) treated at Yale-New Haven Hospital (YNHHS). <i>K</i>-means clustering was used to identify patient subgroups.</p><p><strong>Measurements and main results: </strong>We identified three subgroups in the non-COVID cohort (n=1,736). Each subgroup had distinct clinical characteristics. The reference subgroup was the largest (n=904), followed by cardio-renal (n = 548) and eosinophilic (n=284). The eosinophilic subgroup had milder severity of AECOPD, including a shorter hospital stay (p<0.01). The cardio-renal subgroup had the highest mortality during (5%) and in the year after hospitalization (30%). Validation of the severe AECOPD classifier in the COVID-19 cohort recapitulated the characteristics seen in the non-COVID cohort. AECOPD subgroups in the COVID-19 cohort had different IL-1 beta, IL-2R, and IL-8 levels (FDR ≤ 0.05. These specific leukocyte and cytokine profiles resulted in inflammatory differences between the AECOPD subgroups based on C-reactive protein levels.</p><p><strong>Conclusions: </strong>Incorporating ML with EHR-data allows the identification of specific clinical and biological subgroups for severe AECOPD.</p>","PeriodicalId":51340,"journal":{"name":"Chronic Obstructive Pulmonary Diseases-Journal of the Copd Foundation","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chronic Obstructive Pulmonary Diseases-Journal of the Copd Foundation","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.15326/jcopdf.2024.0556","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RESPIRATORY SYSTEM","Score":null,"Total":0}
引用次数: 0
Abstract
Rationale: Acute exacerbations of COPD (AECOPD) are heterogeneous. Machine learning (ML) has previously been used to dissect some of the heterogeneity in COPD. The widespread adoption of electronic health records (EHRs) has led to the rapid accumulation of large amounts of patient data as part of routine clinical care. However, it is unclear whether the implementation of ML in EHR-derived data has the potential to identify subgroups of AECOPD.
Objectives: Determine whether ML implementation using EHR data from severe AECOPD requiring hospitalization identifies relevant subgroups.
Methods: This study used two retrospective cohorts of patients with AECOPD (non-COVID-19 and COVID-19) treated at Yale-New Haven Hospital (YNHHS). K-means clustering was used to identify patient subgroups.
Measurements and main results: We identified three subgroups in the non-COVID cohort (n=1,736). Each subgroup had distinct clinical characteristics. The reference subgroup was the largest (n=904), followed by cardio-renal (n = 548) and eosinophilic (n=284). The eosinophilic subgroup had milder severity of AECOPD, including a shorter hospital stay (p<0.01). The cardio-renal subgroup had the highest mortality during (5%) and in the year after hospitalization (30%). Validation of the severe AECOPD classifier in the COVID-19 cohort recapitulated the characteristics seen in the non-COVID cohort. AECOPD subgroups in the COVID-19 cohort had different IL-1 beta, IL-2R, and IL-8 levels (FDR ≤ 0.05. These specific leukocyte and cytokine profiles resulted in inflammatory differences between the AECOPD subgroups based on C-reactive protein levels.
Conclusions: Incorporating ML with EHR-data allows the identification of specific clinical and biological subgroups for severe AECOPD.