基于聚类的可解释逻辑回归模型，应用于描述严重嗜酸性粒细胞性哮喘的治疗反应。

IF 1.2 4区数学 International Journal of Biostatistics Pub Date : 2024-06-25 eCollection Date: 2024-11-01 DOI:10.1515/ijb-2023-0061

Massimo Bilancia, Andrea Nigri, Barbara Cafarelli, Danilo Di Bona

{"title":"基于聚类的可解释逻辑回归模型，应用于描述严重嗜酸性粒细胞性哮喘的治疗反应。","authors":"Massimo Bilancia, Andrea Nigri, Barbara Cafarelli, Danilo Di Bona","doi":"10.1515/ijb-2023-0061","DOIUrl":null,"url":null,"abstract":"Asthma is a disease characterized by chronic airway hyperresponsiveness and inflammation, with signs of variable airflow limitation and impaired lung function leading to respiratory symptoms such as shortness of breath, chest tightness and cough. Eosinophilic asthma is a distinct phenotype that affects more than half of patients diagnosed with severe asthma. It can be effectively treated with monoclonal antibodies targeting specific immunological signaling pathways that fuel the inflammation underlying the disease, particularly Interleukin-5 (IL-5), a cytokine that plays a crucial role in asthma. In this study, we propose a data analysis pipeline aimed at identifying subphenotypes of severe eosinophilic asthma in relation to response to therapy at follow-up, which could have great potential for use in routine clinical practice. Once an optimal partition of patients into subphenotypes has been determined, the labels indicating the group to which each patient has been assigned are used in a novel way. For each input variable in a specialized logistic regression model, a clusterwise effect on response to therapy is determined by an appropriate interaction term between the input variable under consideration and the cluster label. We show that the clusterwise odds ratios can be meaningfully interpreted conditional on the cluster label. In this way, we can define an effect measure for the response variable for each input variable in each of the groups identified by the clustering algorithm, which is not possible in standard logistic regression because the effect of the reference class is aliased with the overall intercept. The interpretability of the model is enforced by promoting sparsity, a goal achieved by learning interactions in a hierarchical manner using a special group-Lasso technique. In addition, valid expressions are provided for computing odds ratios in the unusual parameterization used by the sparsity-promoting algorithm. We show how to apply the proposed data analysis pipeline to the problem of sub-phenotyping asthma patients also in terms of quality of response to therapy with monoclonal antibodies.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":"361-388"},"PeriodicalIF":1.2000,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An interpretable cluster-based logistic regression model, with application to the characterization of response to therapy in severe eosinophilic asthma.\",\"authors\":\"Massimo Bilancia, Andrea Nigri, Barbara Cafarelli, Danilo Di Bona\",\"doi\":\"10.1515/ijb-2023-0061\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Asthma is a disease characterized by chronic airway hyperresponsiveness and inflammation, with signs of variable airflow limitation and impaired lung function leading to respiratory symptoms such as shortness of breath, chest tightness and cough. Eosinophilic asthma is a distinct phenotype that affects more than half of patients diagnosed with severe asthma. It can be effectively treated with monoclonal antibodies targeting specific immunological signaling pathways that fuel the inflammation underlying the disease, particularly Interleukin-5 (IL-5), a cytokine that plays a crucial role in asthma. In this study, we propose a data analysis pipeline aimed at identifying subphenotypes of severe eosinophilic asthma in relation to response to therapy at follow-up, which could have great potential for use in routine clinical practice. Once an optimal partition of patients into subphenotypes has been determined, the labels indicating the group to which each patient has been assigned are used in a novel way. For each input variable in a specialized logistic regression model, a clusterwise effect on response to therapy is determined by an appropriate interaction term between the input variable under consideration and the cluster label. We show that the clusterwise odds ratios can be meaningfully interpreted conditional on the cluster label. In this way, we can define an effect measure for the response variable for each input variable in each of the groups identified by the clustering algorithm, which is not possible in standard logistic regression because the effect of the reference class is aliased with the overall intercept. The interpretability of the model is enforced by promoting sparsity, a goal achieved by learning interactions in a hierarchical manner using a special group-Lasso technique. In addition, valid expressions are provided for computing odds ratios in the unusual parameterization used by the sparsity-promoting algorithm. We show how to apply the proposed data analysis pipeline to the problem of sub-phenotyping asthma patients also in terms of quality of response to therapy with monoclonal antibodies.\",\"PeriodicalId\":50333,\"journal\":{\"name\":\"International Journal of Biostatistics\",\"volume\":\" \",\"pages\":\"361-388\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2024-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Biostatistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1515/ijb-2023-0061\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/11/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Biostatistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/ijb-2023-0061","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

哮喘是一种以慢性气道高反应性和炎症为特征的疾病，表现为不同程度的气流受限和肺功能受损，导致气短、胸闷和咳嗽等呼吸道症状。嗜酸性粒细胞性哮喘是一种独特的表型，半数以上的重症哮喘患者都会患上这种疾病。嗜酸性粒细胞性哮喘可通过针对特定免疫信号通路的单克隆抗体进行有效治疗，这些通路会加剧该疾病的炎症，尤其是白细胞介素-5（IL-5），它是一种在哮喘中起关键作用的细胞因子。在这项研究中，我们提出了一个数据分析管道，旨在识别重度嗜酸性粒细胞性哮喘的亚型与随访治疗反应的关系，这在常规临床实践中具有巨大的应用潜力。一旦确定了将患者划分为亚表型的最佳方案，就会以一种新颖的方式使用标明每个患者所属组别的标签。对于专门逻辑回归模型中的每个输入变量，通过考虑的输入变量与分组标签之间适当的交互项来确定分组对治疗反应的影响。我们的研究表明，聚类几率比可以根据聚类标签进行有意义的解释。通过这种方法，我们可以为聚类算法识别出的每个组别中的每个输入变量定义响应变量的效应度量，而这在标准的逻辑回归中是不可能实现的，因为参照类的效应与总体截距有别。模型的可解释性是通过促进稀疏性来实现的，这一目标是通过使用一种特殊的组-拉索（group-Lasso）技术分层学习交互作用来实现的。此外，我们还提供了有效的表达式，用于计算稀疏性促进算法所使用的不寻常参数化中的几率比。我们展示了如何将所提出的数据分析管道应用于哮喘患者的亚表型问题，以及对单克隆抗体治疗的反应质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An interpretable cluster-based logistic regression model, with application to the characterization of response to therapy in severe eosinophilic asthma.

Asthma is a disease characterized by chronic airway hyperresponsiveness and inflammation, with signs of variable airflow limitation and impaired lung function leading to respiratory symptoms such as shortness of breath, chest tightness and cough. Eosinophilic asthma is a distinct phenotype that affects more than half of patients diagnosed with severe asthma. It can be effectively treated with monoclonal antibodies targeting specific immunological signaling pathways that fuel the inflammation underlying the disease, particularly Interleukin-5 (IL-5), a cytokine that plays a crucial role in asthma. In this study, we propose a data analysis pipeline aimed at identifying subphenotypes of severe eosinophilic asthma in relation to response to therapy at follow-up, which could have great potential for use in routine clinical practice. Once an optimal partition of patients into subphenotypes has been determined, the labels indicating the group to which each patient has been assigned are used in a novel way. For each input variable in a specialized logistic regression model, a clusterwise effect on response to therapy is determined by an appropriate interaction term between the input variable under consideration and the cluster label. We show that the clusterwise odds ratios can be meaningfully interpreted conditional on the cluster label. In this way, we can define an effect measure for the response variable for each input variable in each of the groups identified by the clustering algorithm, which is not possible in standard logistic regression because the effect of the reference class is aliased with the overall intercept. The interpretability of the model is enforced by promoting sparsity, a goal achieved by learning interactions in a hierarchical manner using a special group-Lasso technique. In addition, valid expressions are provided for computing odds ratios in the unusual parameterization used by the sparsity-promoting algorithm. We show how to apply the proposed data analysis pipeline to the problem of sub-phenotyping asthma patients also in terms of quality of response to therapy with monoclonal antibodies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Biostatistics Mathematics-Statistics and Probability

CiteScore

2.30

自引率

8.30%

发文量

期刊介绍： The International Journal of Biostatistics (IJB) seeks to publish new biostatistical models and methods, new statistical theory, as well as original applications of statistical methods, for important practical problems arising from the biological, medical, public health, and agricultural sciences with an emphasis on semiparametric methods. Given many alternatives to publish exist within biostatistics, IJB offers a place to publish for research in biostatistics focusing on modern methods, often based on machine-learning and other data-adaptive methodologies, as well as providing a unique reading experience that compels the author to be explicit about the statistical inference problem addressed by the paper. IJB is intended that the journal cover the entire range of biostatistics, from theoretical advances to relevant and sensible translations of a practical problem into a statistical framework. Electronic publication also allows for data and software code to be appended, and opens the door for reproducible research allowing readers to easily replicate analyses described in a paper. Both original research and review articles will be warmly received, as will articles applying sound statistical methods to practical problems.