Biostatistics最新文献_第2页

Incorporating prior information in gene expression network-based cancer heterogeneity analysis. 在基于基因表达网络的癌症异质性分析中纳入先验信息。

IF 2 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae028

Rong Li, Shaodong Xu, Yang Li, Zuojian Tang, Di Feng, James Cai, Shuangge Ma

Cancer is molecularly heterogeneous, with seemingly similar patients having different molecular landscapes and accordingly different clinical behaviors. In recent studies, gene expression networks have been shown as more effective/informative for cancer heterogeneity analysis than some simpler measures. Gene interconnections can be classified as "direct" and "indirect," where the latter can be caused by shared genomic regulators (such as transcription factors, microRNAs, and other regulatory molecules) and other mechanisms. It has been suggested that incorporating the regulators of gene expressions in network analysis and focusing on the direct interconnections can lead to a deeper understanding of the more essential gene interconnections. Such analysis can be seriously challenged by the large number of parameters (jointly caused by network analysis, incorporation of regulators, and heterogeneity) and often weak signals. To effectively tackle this problem, we propose incorporating prior information contained in the published literature. A key challenge is that such prior information can be partial or even wrong. We develop a two-step procedure that can flexibly accommodate different levels of prior information quality. Simulation demonstrates the effectiveness of the proposed approach and its superiority over relevant competitors. In the analysis of a breast cancer dataset, findings different from the alternatives are made, and the identified sample subgroups have important clinical differences.

癌症具有分子异质性，看似相似的患者具有不同的分子图谱，因此临床表现也不尽相同。最近的研究表明，基因表达网络比一些简单的测量方法更能有效地分析癌症的异质性。基因之间的相互联系可分为 "直接 "和 "间接 "两种，后者可能是由共享的基因组调控因子（如转录因子、microRNA 和其他调控分子）和其他机制造成的。有人认为，将基因表达的调控因子纳入网络分析并关注直接的相互联系，可以加深对更本质的基因相互联系的理解。这种分析可能会受到大量参数（由网络分析、纳入调控因子和异质性共同造成）和信号通常较弱的严重挑战。为有效解决这一问题，我们建议将已发表文献中包含的先验信息纳入其中。一个关键的挑战是，这些先验信息可能是片面的，甚至是错误的。我们开发了一种两步程序，可以灵活地适应不同程度的先验信息质量。仿真证明了所提方法的有效性及其优于相关竞争者的优势。在对乳腺癌数据集的分析中，我们得出了与其他方法不同的结论，而且所确定的样本亚群具有重要的临床差异。

{"title":"Incorporating prior information in gene expression network-based cancer heterogeneity analysis.","authors":"Rong Li, Shaodong Xu, Yang Li, Zuojian Tang, Di Feng, James Cai, Shuangge Ma","doi":"10.1093/biostatistics/kxae028","DOIUrl":"10.1093/biostatistics/kxae028","url":null,"abstract":"Cancer is molecularly heterogeneous, with seemingly similar patients having different molecular landscapes and accordingly different clinical behaviors. In recent studies, gene expression networks have been shown as more effective/informative for cancer heterogeneity analysis than some simpler measures. Gene interconnections can be classified as \"direct\" and \"indirect,\" where the latter can be caused by shared genomic regulators (such as transcription factors, microRNAs, and other regulatory molecules) and other mechanisms. It has been suggested that incorporating the regulators of gene expressions in network analysis and focusing on the direct interconnections can lead to a deeper understanding of the more essential gene interconnections. Such analysis can be seriously challenged by the large number of parameters (jointly caused by network analysis, incorporation of regulators, and heterogeneity) and often weak signals. To effectively tackle this problem, we propose incorporating prior information contained in the published literature. A key challenge is that such prior information can be partial or even wrong. We develop a two-step procedure that can flexibly accommodate different levels of prior information quality. Simulation demonstrates the effectiveness of the proposed approach and its superiority over relevant competitors. In the analysis of a breast cancer dataset, findings different from the alternatives are made, and the identified sample subgroups have important clinical differences.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12550826/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141794124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Determining vaccine responders in the presence of baseline immunity using single-cell assays and paired control samples. 在基线免疫存在的情况下，使用单细胞试验和配对对照样本确定疫苗应答者。

IF 2 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf045

Zhe Chen, Siyu Heng, Asa Tapley, Stephen De Rosa, Bo Zhang

A key objective in vaccine studies is to evaluate vaccine-induced immunogenicity and determine whether participants have mounted a response to the vaccine. Cellular immune responses are essential for assessing vaccine-induced immunogenicity, and single-cell assays, such as intracellular cytokine staining (ICS) and B-cell phenotyping (BCP), are commonly employed to profile individual immune cell phenotypes and the cytokines they produce after stimulation. In this article, we introduce a novel statistical framework for identifying vaccine responders using ICS data collected before and after vaccination. This framework incorporates paired control data to account for potential unintended variations between assay runs, such as batch effects, that could lead to misclassification of participants as vaccine responders or non-responders. To formally integrate paired control data for accounting for assay variation across different time points (ie before and after vaccination), our proposed framework calculates and reports two $ P $-values, both adjusting for paired control data but in distinct ways: (i) the maximally adjusted $ P $-value, which applies the most conservative adjustment to the unadjusted $ P $-value, ensuring validity over all plausible batch effects consistent with the paired control samples' data, and (ii) the minimally adjusted $ P $-value, which imposes only the minimal adjustment to the unadjusted $ P $-value, such that the adjusted $ P $-value cannot be falsified by the paired control samples' data. Minimally and maximally adjusted $ P $-values offer a balanced approach to managing Type I error rates and statistical power in the presence of batch effects. We apply this framework to analyze ICS data collected at baseline and 4 wks post-vaccination from the COVID-19 Prevention Network (CoVPN) 3008 study. Our analysis helps address two clinical questions: (i) which participants exhibited evidence of an incident Omicron infection between baseline and 4 wks after receiving the final dose of the primary vaccination series, and (ii) which participants showed vaccine-induced T cell responses against the Omicron BA.4/5 Spike protein.

疫苗研究的一个关键目标是评估疫苗诱导的免疫原性，并确定参与者是否对疫苗产生了反应。细胞免疫应答对于评估疫苗诱导的免疫原性至关重要，单细胞试验，如细胞内细胞因子染色（ICS）和b细胞表型（BCP），通常用于分析个体免疫细胞表型及其在刺激后产生的细胞因子。在本文中，我们介绍了一种新的统计框架，用于使用接种前后收集的ICS数据来识别疫苗应答者。该框架纳入了成对对照数据，以解释分析运行之间潜在的意外变化，例如批量效应，这可能导致将参与者错误分类为疫苗应答者或无应答者。为了正式整合成对对照数据，以解释不同时间点（即接种疫苗之前和之后）的检测变化，我们提出的框架计算并报告两个P值，它们都对成对对照数据进行了调整，但方式不同：(i)最大调整的$ P $值，它对未调整的$ P $值应用最保守的调整，确保与成对对照样本数据一致的所有似是而非的批效应的有效性；（ii）最小调整的$ P $值，它只对未调整的$ P $值施加最小的调整，这样调整后的$ P $值就不会被成对对照样本的数据伪造。最小和最大调整的$ P $值提供了一种平衡的方法来管理第一类错误率和存在批处理效应的统计能力。我们应用这一框架分析了COVID-19预防网络（CoVPN） 3008研究在基线和接种疫苗后4周收集的ICS数据。我们的分析有助于解决两个临床问题：(i)哪些参与者在接受一次疫苗系列的最后剂量后的基线和4周之间表现出意外的Omicron感染的证据，以及（ii）哪些参与者表现出疫苗诱导的针对Omicron BA.4/5刺突蛋白的T细胞反应。

{"title":"Determining vaccine responders in the presence of baseline immunity using single-cell assays and paired control samples.","authors":"Zhe Chen, Siyu Heng, Asa Tapley, Stephen De Rosa, Bo Zhang","doi":"10.1093/biostatistics/kxaf045","DOIUrl":"https://doi.org/10.1093/biostatistics/kxaf045","url":null,"abstract":"A key objective in vaccine studies is to evaluate vaccine-induced immunogenicity and determine whether participants have mounted a response to the vaccine. Cellular immune responses are essential for assessing vaccine-induced immunogenicity, and single-cell assays, such as intracellular cytokine staining (ICS) and B-cell phenotyping (BCP), are commonly employed to profile individual immune cell phenotypes and the cytokines they produce after stimulation. In this article, we introduce a novel statistical framework for identifying vaccine responders using ICS data collected before and after vaccination. This framework incorporates paired control data to account for potential unintended variations between assay runs, such as batch effects, that could lead to misclassification of participants as vaccine responders or non-responders. To formally integrate paired control data for accounting for assay variation across different time points (ie before and after vaccination), our proposed framework calculates and reports two $ P $-values, both adjusting for paired control data but in distinct ways: (i) the maximally adjusted $ P $-value, which applies the most conservative adjustment to the unadjusted $ P $-value, ensuring validity over all plausible batch effects consistent with the paired control samples' data, and (ii) the minimally adjusted $ P $-value, which imposes only the minimal adjustment to the unadjusted $ P $-value, such that the adjusted $ P $-value cannot be falsified by the paired control samples' data. Minimally and maximally adjusted $ P $-values offer a balanced approach to managing Type I error rates and statistical power in the presence of batch effects. We apply this framework to analyze ICS data collected at baseline and 4 wks post-vaccination from the COVID-19 Prevention Network (CoVPN) 3008 study. Our analysis helps address two clinical questions: (i) which participants exhibited evidence of an incident Omicron infection between baseline and 4 wks after receiving the final dose of the primary vaccination series, and (ii) which participants showed vaccine-induced T cell responses against the Omicron BA.4/5 Spike protein.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Shared parameter modeling of longitudinal data allowing for possibly informative visiting process and terminal event. 纵向数据的共享参数建模，允许可能有信息的访问过程和终端事件。

IF 2 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae041

Christos Thomadakis, Loukia Meligkotsidou, Nikos Pantazis, Giota Touloumi

Joint modeling of longitudinal and time-to-event data, particularly through shared parameter models (SPMs), is a common approach for handling longitudinal marker data with an informative terminal event. A critical but often neglected assumption in this context is that the visiting/observation process is noninformative, depending solely on past marker values and visit times. When this assumption fails, the visiting process becomes informative, resulting potentially to biased SPM estimates. Existing methods generally rely on a conditional independence assumption, positing that the marker model, visiting process, and time-to-event model are independent given shared or correlated random effects. Moreover, they are typically built on an intensity-based visiting process using calendar time. This study introduces a unified approach for jointly modeling a normally distributed marker, the visiting process, and time-to-event data in the form of competing risks. Our model conditions on the history of observed marker values, prior visit times, the marker's random effects, and possibly a frailty term independent of the random effects. While our approach aligns with the shared-parameter framework, it does not presume conditional independence between the processes. Additionally, the visiting process can be defined on either a gap time scale, via proportional hazard models, or a calendar time scale, via proportional intensity models. Through extensive simulation studies, we assess the performance of our proposed methodology. We demonstrate that disregarding an informative visiting process can yield significantly biased marker estimates. However, misspecification of the visiting process can also lead to biased estimates. The gap time formulation exhibits greater robustness compared to the intensity-based model when the visiting process is misspecified. In general, enriching the visiting process with prior visit history enhances performance. We further apply our methodology to real longitudinal data from HIV, where visit frequency varies substantially among individuals.

纵向数据和时间到事件数据的联合建模，特别是通过共享参数模型（SPM），是处理具有信息性终端事件的纵向标记数据的常用方法。在这种情况下，一个关键但经常被忽视的假设是，访问/观测过程是非信息性的，完全依赖于过去的标记值和访问时间。当这一假设失效时，访问过程就变成了信息过程，从而可能导致 SPM 估计值出现偏差。现有方法一般依赖于条件独立性假设，即在共享或相关随机效应下，标记模型、访问过程和时间到事件模型是独立的。此外，这些方法通常建立在使用日历时间的基于强度的访问过程之上。本研究引入了一种统一的方法，以竞争风险的形式对正态分布的标记、访问过程和时间到事件数据进行联合建模。我们的模型以观察到的标记值历史、之前的访问时间、标记的随机效应以及可能独立于随机效应的虚弱项为条件。虽然我们的方法与共享参数框架一致，但并不假定过程之间的条件独立性。此外，探视过程既可以通过比例危险模型在间隙时间尺度上定义，也可以通过比例强度模型在日历时间尺度上定义。通过大量的模拟研究，我们评估了我们提出的方法的性能。我们证明，忽略信息丰富的访问过程会导致标记估计值严重偏差。然而，对访问过程的错误描述也会导致有偏差的估计。与基于强度的模型相比，间隙时间模型在访问过程被错误定义时表现出更强的稳健性。一般来说，用先前的访问历史来丰富访问过程可以提高性能。我们进一步将我们的方法应用于艾滋病的真实纵向数据，在这些数据中，不同个体的访问频率存在很大差异。

{"title":"Shared parameter modeling of longitudinal data allowing for possibly informative visiting process and terminal event.","authors":"Christos Thomadakis, Loukia Meligkotsidou, Nikos Pantazis, Giota Touloumi","doi":"10.1093/biostatistics/kxae041","DOIUrl":"10.1093/biostatistics/kxae041","url":null,"abstract":"Joint modeling of longitudinal and time-to-event data, particularly through shared parameter models (SPMs), is a common approach for handling longitudinal marker data with an informative terminal event. A critical but often neglected assumption in this context is that the visiting/observation process is noninformative, depending solely on past marker values and visit times. When this assumption fails, the visiting process becomes informative, resulting potentially to biased SPM estimates. Existing methods generally rely on a conditional independence assumption, positing that the marker model, visiting process, and time-to-event model are independent given shared or correlated random effects. Moreover, they are typically built on an intensity-based visiting process using calendar time. This study introduces a unified approach for jointly modeling a normally distributed marker, the visiting process, and time-to-event data in the form of competing risks. Our model conditions on the history of observed marker values, prior visit times, the marker's random effects, and possibly a frailty term independent of the random effects. While our approach aligns with the shared-parameter framework, it does not presume conditional independence between the processes. Additionally, the visiting process can be defined on either a gap time scale, via proportional hazard models, or a calendar time scale, via proportional intensity models. Through extensive simulation studies, we assess the performance of our proposed methodology. We demonstrate that disregarding an informative visiting process can yield significantly biased marker estimates. However, misspecification of the visiting process can also lead to biased estimates. The gap time formulation exhibits greater robustness compared to the intensity-based model when the visiting process is misspecified. In general, enriching the visiting process with prior visit history enhances performance. We further apply our methodology to real longitudinal data from HIV, where visit frequency varies substantially among individuals.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11911807/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Recoverability of causal effects under presence of missing data: a longitudinal case study. 数据缺失情况下因果效应的可恢复性：纵向案例研究。

IF 2 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae044

Anastasiia Holovchak, Helen McIlleron, Paolo Denti, Michael Schomaker

Missing data in multiple variables is a common issue. We investigate the applicability of the framework of graphical models for handling missing data to a complex longitudinal pharmacological study of children with HIV treated with an efavirenz-based regimen as part of the CHAPAS-3 trial. Specifically, we examine whether the causal effects of interest, defined through static interventions on multiple continuous variables, can be recovered (estimated consistently) from the available data only. So far, no general algorithms are available to decide on recoverability, and decisions have to be made on a case-by-case basis. We emphasize the sensitivity of recoverability to even the smallest changes in the graph structure, and present recoverability results for three plausible missingness-directed acyclic graphs (m-DAGs) in the CHAPAS-3 study, informed by clinical knowledge. Furthermore, we propose the concept of a "closed missingness mechanism": if missing data are generated based on this mechanism, an available case analysis is admissible for consistent estimation of any statistical or causal estimand, even if data are missing not at random. Both simulations and theoretical considerations demonstrate how, in the assumed MNAR setting of our study, a complete or available case analysis can be superior to multiple imputation, and estimation results vary depending on the assumed missingness DAG. Our analyses demonstrate an innovative application of missingness DAGs to complex longitudinal real-world data, while highlighting the sensitivity of the results with respect to the assumed causal model.

多个变量的缺失数据是一个常见问题。我们研究了处理缺失数据的图形模型框架在一项复杂的纵向药理学研究中的适用性，该研究是 CHAPAS-3 试验的一部分，研究对象是接受以依非韦伦为基础的方案治疗的 HIV 感染儿童。具体来说，我们研究了通过对多个连续变量的静态干预所确定的相关因果效应是否可以仅从现有数据中恢复（一致估计）。到目前为止，还没有可用来决定可恢复性的通用算法，必须根据具体情况做出决定。我们强调了可恢复性对图结构中最小变化的敏感性，并介绍了 CHAPAS-3 研究中三个可信的缺失指向无环图（m-DAG）的可恢复性结果，这些结果是以临床知识为基础的。此外，我们还提出了 "封闭缺失机制 "的概念：如果缺失数据是基于这种机制产生的，那么即使数据不是随机缺失，也可以通过可用的病例分析对任何统计或因果估计进行一致的估计。模拟和理论考虑都表明，在我们研究的假定 MNAR 设置中，完整或可用案例分析如何优于多重估算，估算结果因假定的缺失 DAG 而异。我们的分析展示了缺失 DAG 在复杂的纵向真实世界数据中的创新应用，同时强调了结果对假定因果模型的敏感性。

{"title":"Recoverability of causal effects under presence of missing data: a longitudinal case study.","authors":"Anastasiia Holovchak, Helen McIlleron, Paolo Denti, Michael Schomaker","doi":"10.1093/biostatistics/kxae044","DOIUrl":"10.1093/biostatistics/kxae044","url":null,"abstract":"Missing data in multiple variables is a common issue. We investigate the applicability of the framework of graphical models for handling missing data to a complex longitudinal pharmacological study of children with HIV treated with an efavirenz-based regimen as part of the CHAPAS-3 trial. Specifically, we examine whether the causal effects of interest, defined through static interventions on multiple continuous variables, can be recovered (estimated consistently) from the available data only. So far, no general algorithms are available to decide on recoverability, and decisions have to be made on a case-by-case basis. We emphasize the sensitivity of recoverability to even the smallest changes in the graph structure, and present recoverability results for three plausible missingness-directed acyclic graphs (m-DAGs) in the CHAPAS-3 study, informed by clinical knowledge. Furthermore, we propose the concept of a \"closed missingness mechanism\": if missing data are generated based on this mechanism, an available case analysis is admissible for consistent estimation of any statistical or causal estimand, even if data are missing not at random. Both simulations and theoretical considerations demonstrate how, in the assumed MNAR setting of our study, a complete or available case analysis can be superior to multiple imputation, and estimation results vary depending on the assumed missingness DAG. Our analyses demonstrate an innovative application of missingness DAGs to complex longitudinal real-world data, while highlighting the sensitivity of the results with respect to the assumed causal model.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7617375/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142649856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A filtering approach for statistical inference in a stochastic SIR model with an application to Covid-19 data. 随机SIR模型中统计推断的滤波方法及其在Covid-19数据中的应用

IF 2 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf036

Katia Colaneri, Camilla Damian, Rüdiger Frey

In this paper, we consider a discrete-time stochastic SIR model, where the transmission rate and the number of infectious individuals are random and unobservable. This model accounts for random fluctuations in infectiousness and for non-detected infections. Thus, statistical inference has to be performed in a partial information setting. We adopt a Bayesian approach and use nested particle filtering to estimate the state of the system and the parameters. Moreover, we discuss forecasts and model tests based on the posterior predictive distribution. As a case study, we apply our methodology to Austrian Covid-19 infection data.

在本文中，我们考虑一个离散时间随机SIR模型，其中传播率和感染个体的数量是随机的和不可观察的。该模型考虑了传染性和未检测到的感染的随机波动。因此，统计推断必须在部分信息设置中执行。我们采用贝叶斯方法并使用嵌套粒子滤波来估计系统的状态和参数。此外，我们还讨论了基于后验预测分布的预测和模型检验。作为案例研究，我们将我们的方法应用于奥地利Covid-19感染数据。

引用次数: 0

Bipartite interference and air pollution transport: estimating health effects of power plant interventions. 三方干扰与空气污染运输：电厂干预对健康影响的估计。

IF 2 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae051

Corwin Zigler, Vera Liu, Fabrizia Mealli, Laura Forastiere

Evaluating air quality interventions is confronted with the challenge of interference since interventions at a particular pollution source likely impact air quality and health at distant locations, and air quality and health at any given location are likely impacted by interventions at many sources. The structure of interference in this context is dictated by complex atmospheric processes governing how pollution emitted from a particular source is transformed and transported across space and can be cast with a bipartite structure reflecting the two distinct types of units: (i) interventional units on which treatments are applied or withheld to change pollution emissions; and (ii) outcome units on which outcomes of primary interest are measured. We propose new estimands for bipartite causal inference with interference that construe two components of treatment: a "key-associated" (or "individual") treatment and an "upwind" (or "neighborhood") treatment. Estimation is carried out using a covariate adjustment approach based on a joint propensity score. A reduced-complexity atmospheric model characterizes the structure of the interference network by modeling the movement of air parcels through time and space. The new methods are deployed to evaluate the effectiveness of installing flue-gas desulfurization scrubbers on 472 coal-burning power plants (the interventional units) in reducing Medicare hospitalizations among 21,577,552 Medicare beneficiaries residing across 25,553 ZIP codes in the United States (the outcome units).

评估空气质量干预措施面临着干扰的挑战，因为针对特定污染源的干预措施可能会影响遥远地点的空气质量和健康，而任何特定地点的空气质量和健康可能会受到多个来源的干预措施的影响。在这种情况下，干扰的结构是由复杂的大气过程决定的，这些大气过程控制着特定来源排放的污染如何在空间中转化和运输，并且可以用反映两种不同类型单元的两部分结构来表达：(i)对其施加或不施加处理以改变污染排放的干预单元；（ii）衡量主要利益的结果的结果单位。我们提出了新的估计与干扰的双部因果推理，解释两个组成部分的处理：一个“钥匙相关”（或“个人”）处理和一个“逆风”（或“邻居”）处理。使用基于联合倾向得分的协变量调整方法进行估计。一个简化的大气模型通过模拟空气包裹在时间和空间上的运动来表征干扰网络的结构。新方法用于评估在472个燃煤电厂（介入单位）安装烟气脱硫洗涤器在减少居住在美国25,553个邮政编码（结果单位）的21,577,552名医疗保险受益人的医疗保险住院率方面的有效性。

{"title":"Bipartite interference and air pollution transport: estimating health effects of power plant interventions.","authors":"Corwin Zigler, Vera Liu, Fabrizia Mealli, Laura Forastiere","doi":"10.1093/biostatistics/kxae051","DOIUrl":"10.1093/biostatistics/kxae051","url":null,"abstract":"Evaluating air quality interventions is confronted with the challenge of interference since interventions at a particular pollution source likely impact air quality and health at distant locations, and air quality and health at any given location are likely impacted by interventions at many sources. The structure of interference in this context is dictated by complex atmospheric processes governing how pollution emitted from a particular source is transformed and transported across space and can be cast with a bipartite structure reflecting the two distinct types of units: (i) interventional units on which treatments are applied or withheld to change pollution emissions; and (ii) outcome units on which outcomes of primary interest are measured. We propose new estimands for bipartite causal inference with interference that construe two components of treatment: a \"key-associated\" (or \"individual\") treatment and an \"upwind\" (or \"neighborhood\") treatment. Estimation is carried out using a covariate adjustment approach based on a joint propensity score. A reduced-complexity atmospheric model characterizes the structure of the interference network by modeling the movement of air parcels through time and space. The new methods are deployed to evaluate the effectiveness of installing flue-gas desulfurization scrubbers on 472 coal-burning power plants (the interventional units) in reducing Medicare hospitalizations among 21,577,552 Medicare beneficiaries residing across 25,553 ZIP codes in the United States (the outcome units).","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823286/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding the opioid syndemic in North Carolina: A novel approach to modeling and identifying factors. 了解北卡罗莱纳州的阿片类药物综合征：一种建模和识别因素的新方法。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxae052

Eva Murphy, David Kline, Kathleen L Egan, Kathryn E Lancaster, William C Miller, Lance A Waller, Staci A Hepler

The opioid epidemic is a significant public health challenge in North Carolina, but limited data restrict our understanding of its complexity. Examining trends and relationships among different outcomes believed to reflect opioid misuse provides an alternative perspective to understand the opioid epidemic. We use a Bayesian dynamic spatial factor model to capture the interrelated dynamics within six different county-level outcomes, such as illicit opioid overdose deaths, emergency department visits related to drug overdose, treatment counts for opioid use disorder, patients receiving prescriptions for buprenorphine, and newly diagnosed cases of acute and chronic hepatitis C virus and human immunodeficiency virus. We design the factor model to yield meaningful interactions among predefined subsets of these outcomes, causing a departure from the conventional lower triangular structure in the loadings matrix and leading to familiar identifiability issues. To address this challenge, we propose a novel approach that involves decomposing the loadings matrix within a Markov chain Monte Carlo algorithm, allowing us to estimate the loadings and factors uniquely. As a result, we gain a better understanding of the spatio-temporal dynamics of the opioid epidemic in North Carolina.

阿片类药物流行是北卡罗来纳州重大的公共卫生挑战，但有限的数据限制了我们对其复杂性的理解。研究被认为反映阿片类药物滥用的不同结果之间的趋势和关系，为了解阿片类药物流行提供了另一种视角。我们使用贝叶斯动态空间因子模型来捕捉六个不同县级结果的相关动态，例如非法阿片类药物过量死亡，与药物过量相关的急诊就诊，阿片类药物使用障碍的治疗计数，接受丁丙诺啡处方的患者，以及新诊断的急性和慢性丙型肝炎病毒和人类免疫缺陷病毒病例。我们设计了因子模型，以在这些结果的预定义子集之间产生有意义的相互作用，从而导致负载矩阵中传统的下三角形结构的偏离，并导致熟悉的可识别性问题。为了解决这一挑战，我们提出了一种新的方法，该方法涉及在马尔可夫链蒙特卡罗算法中分解负载矩阵，使我们能够唯一地估计负载和因素。因此，我们对北卡罗来纳州阿片类药物流行的时空动态有了更好的了解。

{"title":"Understanding the opioid syndemic in North Carolina: A novel approach to modeling and identifying factors.","authors":"Eva Murphy, David Kline, Kathleen L Egan, Kathryn E Lancaster, William C Miller, Lance A Waller, Staci A Hepler","doi":"10.1093/biostatistics/kxae052","DOIUrl":"10.1093/biostatistics/kxae052","url":null,"abstract":"The opioid epidemic is a significant public health challenge in North Carolina, but limited data restrict our understanding of its complexity. Examining trends and relationships among different outcomes believed to reflect opioid misuse provides an alternative perspective to understand the opioid epidemic. We use a Bayesian dynamic spatial factor model to capture the interrelated dynamics within six different county-level outcomes, such as illicit opioid overdose deaths, emergency department visits related to drug overdose, treatment counts for opioid use disorder, patients receiving prescriptions for buprenorphine, and newly diagnosed cases of acute and chronic hepatitis C virus and human immunodeficiency virus. We design the factor model to yield meaningful interactions among predefined subsets of these outcomes, causing a departure from the conventional lower triangular structure in the loadings matrix and leading to familiar identifiability issues. To address this challenge, we propose a novel approach that involves decomposing the loadings matrix within a Markov chain Monte Carlo algorithm, allowing us to estimate the loadings and factors uniquely. As a result, we gain a better understanding of the spatio-temporal dynamics of the opioid epidemic in North Carolina.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823283/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mediation analysis with graph mediator. 使用图中介的中介分析。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf004

Yixi Xu, Yi Zhao

This study introduces a mediation analysis framework when the mediator is a graph. A Gaussian covariance graph model is assumed for graph presentation. Causal estimands and assumptions are discussed under this presentation. With a covariance matrix as the mediator, a low-rank representation is introduced and parametric mediation models are considered under the structural equation modeling framework. Assuming Gaussian random errors, likelihood-based estimators are introduced to simultaneously identify the low-rank representation and causal parameters. An efficient computational algorithm is proposed and asymptotic properties of the estimators are investigated. Via simulation studies, the performance of the proposed approach is evaluated. Applying to a resting-state fMRI study, a brain network is identified within which functional connectivity mediates the sex difference in the performance of a motor task.

本研究引入了一个以图为中介的中介分析框架。图的表示采用高斯协方差图模型。本报告将讨论因果估计和假设。以协方差矩阵为中介，引入低秩表示，在结构方程建模框架下考虑参数化中介模型。在假设高斯随机误差的情况下，引入基于似然的估计器来同时识别低秩表示和因果参数。提出了一种有效的计算算法，并研究了估计量的渐近性质。通过仿真研究，对该方法的性能进行了评价。应用静息状态fMRI研究，确定了一个大脑网络，其中功能连接介导了运动任务表现的性别差异。

引用次数: 0

Markov switching zero-inflated space-time multinomial models for comparing multiple infectious diseases. 比较多种传染病的马尔可夫切换零膨胀时空多项模型。

IF 2 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf034

Dirk Douwes-Schultz, Alexandra M Schmidt, Laís Picinini Freitas, Marilia Sá Carvalho

Univariate zero-inflated models are increasingly being used to account for excess zeros in spatio-temporal infectious disease counts. However, the multivariate case is challenging due to the need to account for correlations across space, time and disease in both the count and zero-inflated components of the model. We are interested in comparing the transmission dynamics of several co-circulating infectious diseases across space and time, where some of the diseases can be absent for long periods. We first assume there is a baseline disease that is well-established and always present in the region. The other diseases switch between periods of presence and absence in each area through a series of coupled Markov chains, which account for long periods of disease absence, disease interactions and disease spread from neighboring areas. Since we are mainly interested in comparing the diseases, we assume the cases of the present diseases in an area jointly follow an autoregressive multinomial model. We use the multinomial model to investigate whether there are associations between certain factors, such as temperature, and differences in the transmission intensity of the diseases. Inference is performed using efficient Bayesian Markov chain Monte Carlo methods based on jointly sampling all unknown presence indicators. We apply the model to spatio-temporal counts of dengue, Zika, and chikungunya cases in Rio de Janeiro, during the first triple epidemic there.

单变量零膨胀模型越来越多地被用于解释时空传染病计数中的超额零。然而，多变量情况具有挑战性，因为需要在模型的计数和零膨胀成分中考虑到空间、时间和疾病之间的相关性。我们感兴趣的是比较几种共循环传染病在空间和时间上的传播动力学，其中一些疾病可以长时间不存在。我们首先假设存在一种基线疾病，该疾病在该地区得到确认并一直存在。其他疾病通过一系列耦合的马尔可夫链在每个地区存在和不存在的时期之间切换，这解释了长时间的疾病缺失，疾病相互作用和疾病从邻近地区传播。由于我们主要对疾病的比较感兴趣，我们假设一个地区的现有疾病病例共同遵循自回归多项式模型。我们使用多项模型来研究某些因素（如温度）与疾病传播强度的差异之间是否存在关联。基于联合采样所有未知存在指标，使用有效的贝叶斯马尔可夫链蒙特卡罗方法进行推理。我们将该模型应用于巴西里约热内卢首次三重流行期间登革热、寨卡和基孔肯雅病例的时空计数。

{"title":"Markov switching zero-inflated space-time multinomial models for comparing multiple infectious diseases.","authors":"Dirk Douwes-Schultz, Alexandra M Schmidt, Laís Picinini Freitas, Marilia Sá Carvalho","doi":"10.1093/biostatistics/kxaf034","DOIUrl":"10.1093/biostatistics/kxaf034","url":null,"abstract":"Univariate zero-inflated models are increasingly being used to account for excess zeros in spatio-temporal infectious disease counts. However, the multivariate case is challenging due to the need to account for correlations across space, time and disease in both the count and zero-inflated components of the model. We are interested in comparing the transmission dynamics of several co-circulating infectious diseases across space and time, where some of the diseases can be absent for long periods. We first assume there is a baseline disease that is well-established and always present in the region. The other diseases switch between periods of presence and absence in each area through a series of coupled Markov chains, which account for long periods of disease absence, disease interactions and disease spread from neighboring areas. Since we are mainly interested in comparing the diseases, we assume the cases of the present diseases in an area jointly follow an autoregressive multinomial model. We use the multinomial model to investigate whether there are associations between certain factors, such as temperature, and differences in the transmission intensity of the diseases. Inference is performed using efficient Bayesian Markov chain Monte Carlo methods based on jointly sampling all unknown presence indicators. We apply the model to spatio-temporal counts of dengue, Zika, and chikungunya cases in Rio de Janeiro, during the first triple epidemic there.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12596980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145483585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Wastewater-based reproduction rates for epidemic curve reconstruction. 流行病曲线重建中基于废水的繁殖率。

IF 2 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics

Pub Date : 2024-12-31 DOI: 10.1093/biostatistics/kxaf033

Emily Somerset, Justin J Slater, Patrick E Brown

We introduce a hierarchical Bayesian framework for reconstructing epidemic curves using under-reported case counts and wastewater data. Our approach models wastewater signals as differentiable Gaussian processes, enabling inference on their relative growth rates, which are used to define a wastewater-based reproduction rate. These estimates are incorporated into a binomially thinned Poisson autoregressive model for case counts using a modular inference strategy. We apply this framework to reconstruct the Covid-19 epidemic curve in Toronto, validating our model through out-of-sample forecasts and comparisons with independent serosurvey-based cumulative incidence estimates. We also apply the framework to New Zealand's Covid-19 data to reconstruct its epidemic curve and demonstrate improvements over an existing joint model for wastewater and case data. A key advantage of our framework, highlighted in this comparison, is that it does not rely on pre-specified constant parameters, allowing the model to better adapt to evolving pandemic conditions.

我们引入了一个层次贝叶斯框架，用于利用未报告的病例数和废水数据重建流行病曲线。我们的方法将废水信号建模为可微的高斯过程，从而可以推断其相对增长率，从而用于定义基于废水的繁殖率。这些估计被纳入一个二项稀释泊松自回归模型的情况下计数使用模块化推理策略。我们将该框架应用于重建多伦多的Covid-19流行曲线，通过样本外预测和与基于独立血清调查的累积发病率估计的比较来验证我们的模型。我们还将该框架应用于新西兰的Covid-19数据，以重建其流行曲线，并展示对现有废水和病例数据联合模型的改进。这一比较突出表明，我们的框架的一个关键优势是，它不依赖于预先指定的恒定参数，从而使模型能够更好地适应不断变化的大流行情况。

引用次数: 0