Obtaining patient phenotypes in SARS-CoV-2 pneumonia, and their association with clinical severity and mortality.

IF 6.2 Q1 RESPIRATORY SYSTEM Pneumonia Pub Date : 2024-06-25 DOI:10.1186/s41479-024-00132-0

Fernando García-García, Dae-Jin Lee, Mónica Nieves-Ermecheo, Olaia Bronte, Pedro Pablo España, José María Quintana, Rosario Menéndez, Antoni Torres, Luis Alberto Ruiz Iturriaga, Isabel Urrutia

{"title":"Obtaining patient phenotypes in SARS-CoV-2 pneumonia, and their association with clinical severity and mortality.","authors":"Fernando García-García, Dae-Jin Lee, Mónica Nieves-Ermecheo, Olaia Bronte, Pedro Pablo España, José María Quintana, Rosario Menéndez, Antoni Torres, Luis Alberto Ruiz Iturriaga, Isabel Urrutia","doi":"10.1186/s41479-024-00132-0","DOIUrl":null,"url":null,"abstract":"Background: There exists consistent empirical evidence in the literature pointing out ample heterogeneity in terms of the clinical evolution of patients with COVID-19. The identification of specific phenotypes underlying in the population might contribute towards a better understanding and characterization of the different courses of the disease. The aim of this study was to identify distinct clinical phenotypes among hospitalized patients with SARS-CoV-2 pneumonia using machine learning clustering, and to study their association with subsequent clinical outcomes as severity and mortality.Methods: Multicentric observational, prospective, longitudinal, cohort study conducted in four hospitals in Spain. We included adult patients admitted for in-hospital stay due to SARS-CoV-2 pneumonia. We collected a broad spectrum of variables to describe exhaustively each case: patient demographics, comorbidities, symptoms, physiological status, baseline examinations (blood analytics, arterial gas test), etc. For the development and internal validation of the clustering/phenotype models, the dataset was split into training and test sets (50% each). We proposed a sequence of machine learning stages: feature scaling, missing data imputation, reduction of data dimensionality via Kernel Principal Component Analysis (KPCA), and clustering with the k-means algorithm. The optimal cluster model parameters -including k, the number of phenotypes- were chosen automatically, by maximizing the average Silhouette score across the training set.Results: We enrolled 1548 patients, each of them characterized by 92 clinical attributes (d=109 features after variable encoding). Our clustering algorithm identified k=3 distinct phenotypes and 18 strongly informative variables: Phenotype A (788 cases [50.9% prevalence] - age <math><mo>∼</mo></math> 57, Charlson comorbidity <math><mo>∼</mo></math> 1, pneumonia CURB-65 score <math><mo>∼</mo></math> 0 to 1, respiratory rate at admission <math><mo>∼</mo></math> 18 min-1, FiO2 <math><mo>∼</mo></math> 21%, C-reactive protein CRP <math><mo>∼</mo></math> 49.5 mg/dL [median within cluster]); phenotype B (620 cases [40.0%] - age <math><mo>∼</mo></math> 75, Charlson <math><mo>∼</mo></math> 5, CURB-65 <math><mo>∼</mo></math> 1 to 2, respiration <math><mo>∼</mo></math> 20 min-1, FiO2 <math><mo>∼</mo></math> 21%, CRP <math><mo>∼</mo></math> 101.5 mg/dL); and phenotype C (140 cases [9.0%] - age <math><mo>∼</mo></math> 71, Charlson <math><mo>∼</mo></math> 4, CURB-65 <math><mo>∼</mo></math> 0 to 2, respiration <math><mo>∼</mo></math> 30 min-1, FiO2 <math><mo>∼</mo></math> 38%, CRP <math><mo>∼</mo></math> 152.3 mg/dL). Hypothesis testing provided solid statistical evidence supporting an interaction between phenotype and each clinical outcome: severity and mortality. By computing their corresponding odds ratios, a clear trend was found for higher frequencies of unfavourable evolution in phenotype C with respect to B, as well as more unfavourable in phenotype B than in A.Conclusion: A compound unsupervised clustering technique (including a fully-automated optimization of its internal parameters) revealed the existence of three distinct groups of patients - phenotypes. In turn, these showed strong associations with the clinical severity in the progression of pneumonia, and with mortality.","PeriodicalId":45120,"journal":{"name":"Pneumonia","volume":"16 1","pages":"12"},"PeriodicalIF":6.2000,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11637184/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pneumonia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41479-024-00132-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RESPIRATORY SYSTEM","Score":null,"Total":0}

引用次数: 0

Abstract

Background: There exists consistent empirical evidence in the literature pointing out ample heterogeneity in terms of the clinical evolution of patients with COVID-19. The identification of specific phenotypes underlying in the population might contribute towards a better understanding and characterization of the different courses of the disease. The aim of this study was to identify distinct clinical phenotypes among hospitalized patients with SARS-CoV-2 pneumonia using machine learning clustering, and to study their association with subsequent clinical outcomes as severity and mortality.

Methods: Multicentric observational, prospective, longitudinal, cohort study conducted in four hospitals in Spain. We included adult patients admitted for in-hospital stay due to SARS-CoV-2 pneumonia. We collected a broad spectrum of variables to describe exhaustively each case: patient demographics, comorbidities, symptoms, physiological status, baseline examinations (blood analytics, arterial gas test), etc. For the development and internal validation of the clustering/phenotype models, the dataset was split into training and test sets (50% each). We proposed a sequence of machine learning stages: feature scaling, missing data imputation, reduction of data dimensionality via Kernel Principal Component Analysis (KPCA), and clustering with the k-means algorithm. The optimal cluster model parameters -including k, the number of phenotypes- were chosen automatically, by maximizing the average Silhouette score across the training set.

Results: We enrolled 1548 patients, each of them characterized by 92 clinical attributes (d=109 features after variable encoding). Our clustering algorithm identified k=3 distinct phenotypes and 18 strongly informative variables: Phenotype A (788 cases [50.9% prevalence] - age $\sim$ 57, Charlson comorbidity $\sim$ 1, pneumonia CURB-65 score $\sim$ 0 to 1, respiratory rate at admission $\sim$ 18 min^-1, FiO₂ $\sim$ 21%, C-reactive protein CRP $\sim$ 49.5 mg/dL [median within cluster]); phenotype B (620 cases [40.0%] - age $\sim$ 75, Charlson $\sim$ 5, CURB-65 $\sim$ 1 to 2, respiration $\sim$ 20 min^-1, FiO₂ $\sim$ 21%, CRP $\sim$ 101.5 mg/dL); and phenotype C (140 cases [9.0%] - age $\sim$ 71, Charlson $\sim$ 4, CURB-65 $\sim$ 0 to 2, respiration $\sim$ 30 min^-1, FiO₂ $\sim$ 38%, CRP $\sim$ 152.3 mg/dL). Hypothesis testing provided solid statistical evidence supporting an interaction between phenotype and each clinical outcome: severity and mortality. By computing their corresponding odds ratios, a clear trend was found for higher frequencies of unfavourable evolution in phenotype C with respect to B, as well as more unfavourable in phenotype B than in A.

Conclusion: A compound unsupervised clustering technique (including a fully-automated optimization of its internal parameters) revealed the existence of three distinct groups of patients - phenotypes. In turn, these showed strong associations with the clinical severity in the progression of pneumonia, and with mortality.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

获取 SARS-CoV-2 肺炎患者的表型及其与临床严重程度和死亡率的关系。

背景：文献中有一致的实证证据表明，COVID-19 患者的临床演变具有很大的异质性。确定人群中潜在的特定表型可能有助于更好地理解和描述该疾病的不同病程。本研究的目的是利用机器学习聚类方法在住院的 SARS-CoV-2 肺炎患者中识别出不同的临床表型，并研究它们与随后的临床结果（如严重程度和死亡率）之间的关系：在西班牙四家医院开展的多中心观察性、前瞻性、纵向队列研究。研究对象包括因 SARS-CoV-2 肺炎住院的成年患者。我们收集了大量变量，以详尽描述每个病例：患者的人口统计学特征、合并症、症状、生理状态、基线检查（血液分析、动脉气体测试）等。为了开发和内部验证聚类/表型模型，数据集被分成训练集和测试集（各占 50%）。我们提出了一系列机器学习阶段：特征缩放、缺失数据估算、通过核主成分分析（KPCA）降低数据维度，以及使用 k-means 算法进行聚类。通过最大化整个训练集的平均 Silhouette 分数，自动选择最佳聚类模型参数（包括表型数量 k）：我们招募了 1548 名患者，每名患者都有 92 个临床属性（变量编码后为 109 个特征）。我们的聚类算法确定了 k=3 个不同的表型和 18 个强信息变量：表型 A（788 例 [50.9% 患病率] - 年龄 ∼ 57 岁，Charlson 合并症 ∼ 1，肺炎 CURB-65 评分 ∼ 0 至 1，入院时呼吸频率 ∼ 18 min-1，FiO2 ∼ 21%，C 反应蛋白 CRP ∼ 49.5 mg/dL [聚类内中位数]）；表型 B（620 例 [40.0%] - 年龄 ∼ 75 岁，Charlson ∼ 5 岁，CURB-65 ∼ 1 至 2 岁，呼吸频率 ∼ 20 min-1，FiO2 ∼ 21%，CRP ∼ 101.5 mg/dL）；表型 C（140 例 [9.0%]-年龄 ∼ 71，Charlson ∼ 4，CURB-65 ∼ 0 至 2，呼吸 ∼ 30 min-1，FiO2 ∼ 38%，CRP ∼ 152.3 mg/dL）。假设检验提供了可靠的统计证据，证明表型与严重程度和死亡率这两个临床结果之间存在相互作用。通过计算相应的几率比，我们发现一个明显的趋势，即表型 C 的不利演变频率高于表型 B，表型 B 的不利演变频率也高于表型 A：复合无监督聚类技术（包括对其内部参数的全自动优化）揭示了存在三个不同的患者群体--表型。反过来，这些表型又与肺炎进展的临床严重程度和死亡率密切相关。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊