Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf045
Zhe Chen, Siyu Heng, Asa Tapley, Stephen De Rosa, Bo Zhang
A key objective in vaccine studies is to evaluate vaccine-induced immunogenicity and determine whether participants have mounted a response to the vaccine. Cellular immune responses are essential for assessing vaccine-induced immunogenicity, and single-cell assays, such as intracellular cytokine staining (ICS) and B-cell phenotyping (BCP), are commonly employed to profile individual immune cell phenotypes and the cytokines they produce after stimulation. In this article, we introduce a novel statistical framework for identifying vaccine responders using ICS data collected before and after vaccination. This framework incorporates paired control data to account for potential unintended variations between assay runs, such as batch effects, that could lead to misclassification of participants as vaccine responders or non-responders. To formally integrate paired control data for accounting for assay variation across different time points (ie before and after vaccination), our proposed framework calculates and reports two $ P $-values, both adjusting for paired control data but in distinct ways: (i) the maximally adjusted $ P $-value, which applies the most conservative adjustment to the unadjusted $ P $-value, ensuring validity over all plausible batch effects consistent with the paired control samples' data, and (ii) the minimally adjusted $ P $-value, which imposes only the minimal adjustment to the unadjusted $ P $-value, such that the adjusted $ P $-value cannot be falsified by the paired control samples' data. Minimally and maximally adjusted $ P $-values offer a balanced approach to managing Type I error rates and statistical power in the presence of batch effects. We apply this framework to analyze ICS data collected at baseline and 4 wks post-vaccination from the COVID-19 Prevention Network (CoVPN) 3008 study. Our analysis helps address two clinical questions: (i) which participants exhibited evidence of an incident Omicron infection between baseline and 4 wks after receiving the final dose of the primary vaccination series, and (ii) which participants showed vaccine-induced T cell responses against the Omicron BA.4/5 Spike protein.
疫苗研究的一个关键目标是评估疫苗诱导的免疫原性,并确定参与者是否对疫苗产生了反应。细胞免疫应答对于评估疫苗诱导的免疫原性至关重要,单细胞试验,如细胞内细胞因子染色(ICS)和b细胞表型(BCP),通常用于分析个体免疫细胞表型及其在刺激后产生的细胞因子。在本文中,我们介绍了一种新的统计框架,用于使用接种前后收集的ICS数据来识别疫苗应答者。该框架纳入了成对对照数据,以解释分析运行之间潜在的意外变化,例如批量效应,这可能导致将参与者错误分类为疫苗应答者或无应答者。为了正式整合成对对照数据,以解释不同时间点(即接种疫苗之前和之后)的检测变化,我们提出的框架计算并报告两个P值,它们都对成对对照数据进行了调整,但方式不同:(i)最大调整的$ P $值,它对未调整的$ P $值应用最保守的调整,确保与成对对照样本数据一致的所有似是而非的批效应的有效性;(ii)最小调整的$ P $值,它只对未调整的$ P $值施加最小的调整,这样调整后的$ P $值就不会被成对对照样本的数据伪造。最小和最大调整的$ P $值提供了一种平衡的方法来管理第一类错误率和存在批处理效应的统计能力。我们应用这一框架分析了COVID-19预防网络(CoVPN) 3008研究在基线和接种疫苗后4周收集的ICS数据。我们的分析有助于解决两个临床问题:(i)哪些参与者在接受一次疫苗系列的最后剂量后的基线和4周之间表现出意外的Omicron感染的证据,以及(ii)哪些参与者表现出疫苗诱导的针对Omicron BA.4/5刺突蛋白的T细胞反应。
{"title":"Determining vaccine responders in the presence of baseline immunity using single-cell assays and paired control samples.","authors":"Zhe Chen, Siyu Heng, Asa Tapley, Stephen De Rosa, Bo Zhang","doi":"10.1093/biostatistics/kxaf045","DOIUrl":"https://doi.org/10.1093/biostatistics/kxaf045","url":null,"abstract":"<p><p>A key objective in vaccine studies is to evaluate vaccine-induced immunogenicity and determine whether participants have mounted a response to the vaccine. Cellular immune responses are essential for assessing vaccine-induced immunogenicity, and single-cell assays, such as intracellular cytokine staining (ICS) and B-cell phenotyping (BCP), are commonly employed to profile individual immune cell phenotypes and the cytokines they produce after stimulation. In this article, we introduce a novel statistical framework for identifying vaccine responders using ICS data collected before and after vaccination. This framework incorporates paired control data to account for potential unintended variations between assay runs, such as batch effects, that could lead to misclassification of participants as vaccine responders or non-responders. To formally integrate paired control data for accounting for assay variation across different time points (ie before and after vaccination), our proposed framework calculates and reports two $ P $-values, both adjusting for paired control data but in distinct ways: (i) the maximally adjusted $ P $-value, which applies the most conservative adjustment to the unadjusted $ P $-value, ensuring validity over all plausible batch effects consistent with the paired control samples' data, and (ii) the minimally adjusted $ P $-value, which imposes only the minimal adjustment to the unadjusted $ P $-value, such that the adjusted $ P $-value cannot be falsified by the paired control samples' data. Minimally and maximally adjusted $ P $-values offer a balanced approach to managing Type I error rates and statistical power in the presence of batch effects. We apply this framework to analyze ICS data collected at baseline and 4 wks post-vaccination from the COVID-19 Prevention Network (CoVPN) 3008 study. Our analysis helps address two clinical questions: (i) which participants exhibited evidence of an incident Omicron infection between baseline and 4 wks after receiving the final dose of the primary vaccination series, and (ii) which participants showed vaccine-induced T cell responses against the Omicron BA.4/5 Spike protein.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joint modeling of longitudinal and time-to-event data, particularly through shared parameter models (SPMs), is a common approach for handling longitudinal marker data with an informative terminal event. A critical but often neglected assumption in this context is that the visiting/observation process is noninformative, depending solely on past marker values and visit times. When this assumption fails, the visiting process becomes informative, resulting potentially to biased SPM estimates. Existing methods generally rely on a conditional independence assumption, positing that the marker model, visiting process, and time-to-event model are independent given shared or correlated random effects. Moreover, they are typically built on an intensity-based visiting process using calendar time. This study introduces a unified approach for jointly modeling a normally distributed marker, the visiting process, and time-to-event data in the form of competing risks. Our model conditions on the history of observed marker values, prior visit times, the marker's random effects, and possibly a frailty term independent of the random effects. While our approach aligns with the shared-parameter framework, it does not presume conditional independence between the processes. Additionally, the visiting process can be defined on either a gap time scale, via proportional hazard models, or a calendar time scale, via proportional intensity models. Through extensive simulation studies, we assess the performance of our proposed methodology. We demonstrate that disregarding an informative visiting process can yield significantly biased marker estimates. However, misspecification of the visiting process can also lead to biased estimates. The gap time formulation exhibits greater robustness compared to the intensity-based model when the visiting process is misspecified. In general, enriching the visiting process with prior visit history enhances performance. We further apply our methodology to real longitudinal data from HIV, where visit frequency varies substantially among individuals.
{"title":"Shared parameter modeling of longitudinal data allowing for possibly informative visiting process and terminal event.","authors":"Christos Thomadakis, Loukia Meligkotsidou, Nikos Pantazis, Giota Touloumi","doi":"10.1093/biostatistics/kxae041","DOIUrl":"10.1093/biostatistics/kxae041","url":null,"abstract":"<p><p>Joint modeling of longitudinal and time-to-event data, particularly through shared parameter models (SPMs), is a common approach for handling longitudinal marker data with an informative terminal event. A critical but often neglected assumption in this context is that the visiting/observation process is noninformative, depending solely on past marker values and visit times. When this assumption fails, the visiting process becomes informative, resulting potentially to biased SPM estimates. Existing methods generally rely on a conditional independence assumption, positing that the marker model, visiting process, and time-to-event model are independent given shared or correlated random effects. Moreover, they are typically built on an intensity-based visiting process using calendar time. This study introduces a unified approach for jointly modeling a normally distributed marker, the visiting process, and time-to-event data in the form of competing risks. Our model conditions on the history of observed marker values, prior visit times, the marker's random effects, and possibly a frailty term independent of the random effects. While our approach aligns with the shared-parameter framework, it does not presume conditional independence between the processes. Additionally, the visiting process can be defined on either a gap time scale, via proportional hazard models, or a calendar time scale, via proportional intensity models. Through extensive simulation studies, we assess the performance of our proposed methodology. We demonstrate that disregarding an informative visiting process can yield significantly biased marker estimates. However, misspecification of the visiting process can also lead to biased estimates. The gap time formulation exhibits greater robustness compared to the intensity-based model when the visiting process is misspecified. In general, enriching the visiting process with prior visit history enhances performance. We further apply our methodology to real longitudinal data from HIV, where visit frequency varies substantially among individuals.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11911807/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae044
Anastasiia Holovchak, Helen McIlleron, Paolo Denti, Michael Schomaker
Missing data in multiple variables is a common issue. We investigate the applicability of the framework of graphical models for handling missing data to a complex longitudinal pharmacological study of children with HIV treated with an efavirenz-based regimen as part of the CHAPAS-3 trial. Specifically, we examine whether the causal effects of interest, defined through static interventions on multiple continuous variables, can be recovered (estimated consistently) from the available data only. So far, no general algorithms are available to decide on recoverability, and decisions have to be made on a case-by-case basis. We emphasize the sensitivity of recoverability to even the smallest changes in the graph structure, and present recoverability results for three plausible missingness-directed acyclic graphs (m-DAGs) in the CHAPAS-3 study, informed by clinical knowledge. Furthermore, we propose the concept of a "closed missingness mechanism": if missing data are generated based on this mechanism, an available case analysis is admissible for consistent estimation of any statistical or causal estimand, even if data are missing not at random. Both simulations and theoretical considerations demonstrate how, in the assumed MNAR setting of our study, a complete or available case analysis can be superior to multiple imputation, and estimation results vary depending on the assumed missingness DAG. Our analyses demonstrate an innovative application of missingness DAGs to complex longitudinal real-world data, while highlighting the sensitivity of the results with respect to the assumed causal model.
多个变量的缺失数据是一个常见问题。我们研究了处理缺失数据的图形模型框架在一项复杂的纵向药理学研究中的适用性,该研究是 CHAPAS-3 试验的一部分,研究对象是接受以依非韦伦为基础的方案治疗的 HIV 感染儿童。具体来说,我们研究了通过对多个连续变量的静态干预所确定的相关因果效应是否可以仅从现有数据中恢复(一致估计)。到目前为止,还没有可用来决定可恢复性的通用算法,必须根据具体情况做出决定。我们强调了可恢复性对图结构中最小变化的敏感性,并介绍了 CHAPAS-3 研究中三个可信的缺失指向无环图(m-DAG)的可恢复性结果,这些结果是以临床知识为基础的。此外,我们还提出了 "封闭缺失机制 "的概念:如果缺失数据是基于这种机制产生的,那么即使数据不是随机缺失,也可以通过可用的病例分析对任何统计或因果估计进行一致的估计。模拟和理论考虑都表明,在我们研究的假定 MNAR 设置中,完整或可用案例分析如何优于多重估算,估算结果因假定的缺失 DAG 而异。我们的分析展示了缺失 DAG 在复杂的纵向真实世界数据中的创新应用,同时强调了结果对假定因果模型的敏感性。
{"title":"Recoverability of causal effects under presence of missing data: a longitudinal case study.","authors":"Anastasiia Holovchak, Helen McIlleron, Paolo Denti, Michael Schomaker","doi":"10.1093/biostatistics/kxae044","DOIUrl":"10.1093/biostatistics/kxae044","url":null,"abstract":"<p><p>Missing data in multiple variables is a common issue. We investigate the applicability of the framework of graphical models for handling missing data to a complex longitudinal pharmacological study of children with HIV treated with an efavirenz-based regimen as part of the CHAPAS-3 trial. Specifically, we examine whether the causal effects of interest, defined through static interventions on multiple continuous variables, can be recovered (estimated consistently) from the available data only. So far, no general algorithms are available to decide on recoverability, and decisions have to be made on a case-by-case basis. We emphasize the sensitivity of recoverability to even the smallest changes in the graph structure, and present recoverability results for three plausible missingness-directed acyclic graphs (m-DAGs) in the CHAPAS-3 study, informed by clinical knowledge. Furthermore, we propose the concept of a \"closed missingness mechanism\": if missing data are generated based on this mechanism, an available case analysis is admissible for consistent estimation of any statistical or causal estimand, even if data are missing not at random. Both simulations and theoretical considerations demonstrate how, in the assumed MNAR setting of our study, a complete or available case analysis can be superior to multiple imputation, and estimation results vary depending on the assumed missingness DAG. Our analyses demonstrate an innovative application of missingness DAGs to complex longitudinal real-world data, while highlighting the sensitivity of the results with respect to the assumed causal model.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7617375/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142649856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf036
Katia Colaneri, Camilla Damian, Rüdiger Frey
In this paper, we consider a discrete-time stochastic SIR model, where the transmission rate and the number of infectious individuals are random and unobservable. This model accounts for random fluctuations in infectiousness and for non-detected infections. Thus, statistical inference has to be performed in a partial information setting. We adopt a Bayesian approach and use nested particle filtering to estimate the state of the system and the parameters. Moreover, we discuss forecasts and model tests based on the posterior predictive distribution. As a case study, we apply our methodology to Austrian Covid-19 infection data.
{"title":"A filtering approach for statistical inference in a stochastic SIR model with an application to Covid-19 data.","authors":"Katia Colaneri, Camilla Damian, Rüdiger Frey","doi":"10.1093/biostatistics/kxaf036","DOIUrl":"10.1093/biostatistics/kxaf036","url":null,"abstract":"<p><p>In this paper, we consider a discrete-time stochastic SIR model, where the transmission rate and the number of infectious individuals are random and unobservable. This model accounts for random fluctuations in infectiousness and for non-detected infections. Thus, statistical inference has to be performed in a partial information setting. We adopt a Bayesian approach and use nested particle filtering to estimate the state of the system and the parameters. Moreover, we discuss forecasts and model tests based on the posterior predictive distribution. As a case study, we apply our methodology to Austrian Covid-19 infection data.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12554006/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145373268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae051
Corwin Zigler, Vera Liu, Fabrizia Mealli, Laura Forastiere
Evaluating air quality interventions is confronted with the challenge of interference since interventions at a particular pollution source likely impact air quality and health at distant locations, and air quality and health at any given location are likely impacted by interventions at many sources. The structure of interference in this context is dictated by complex atmospheric processes governing how pollution emitted from a particular source is transformed and transported across space and can be cast with a bipartite structure reflecting the two distinct types of units: (i) interventional units on which treatments are applied or withheld to change pollution emissions; and (ii) outcome units on which outcomes of primary interest are measured. We propose new estimands for bipartite causal inference with interference that construe two components of treatment: a "key-associated" (or "individual") treatment and an "upwind" (or "neighborhood") treatment. Estimation is carried out using a covariate adjustment approach based on a joint propensity score. A reduced-complexity atmospheric model characterizes the structure of the interference network by modeling the movement of air parcels through time and space. The new methods are deployed to evaluate the effectiveness of installing flue-gas desulfurization scrubbers on 472 coal-burning power plants (the interventional units) in reducing Medicare hospitalizations among 21,577,552 Medicare beneficiaries residing across 25,553 ZIP codes in the United States (the outcome units).
{"title":"Bipartite interference and air pollution transport: estimating health effects of power plant interventions.","authors":"Corwin Zigler, Vera Liu, Fabrizia Mealli, Laura Forastiere","doi":"10.1093/biostatistics/kxae051","DOIUrl":"10.1093/biostatistics/kxae051","url":null,"abstract":"<p><p>Evaluating air quality interventions is confronted with the challenge of interference since interventions at a particular pollution source likely impact air quality and health at distant locations, and air quality and health at any given location are likely impacted by interventions at many sources. The structure of interference in this context is dictated by complex atmospheric processes governing how pollution emitted from a particular source is transformed and transported across space and can be cast with a bipartite structure reflecting the two distinct types of units: (i) interventional units on which treatments are applied or withheld to change pollution emissions; and (ii) outcome units on which outcomes of primary interest are measured. We propose new estimands for bipartite causal inference with interference that construe two components of treatment: a \"key-associated\" (or \"individual\") treatment and an \"upwind\" (or \"neighborhood\") treatment. Estimation is carried out using a covariate adjustment approach based on a joint propensity score. A reduced-complexity atmospheric model characterizes the structure of the interference network by modeling the movement of air parcels through time and space. The new methods are deployed to evaluate the effectiveness of installing flue-gas desulfurization scrubbers on 472 coal-burning power plants (the interventional units) in reducing Medicare hospitalizations among 21,577,552 Medicare beneficiaries residing across 25,553 ZIP codes in the United States (the outcome units).</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823286/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae052
Eva Murphy, David Kline, Kathleen L Egan, Kathryn E Lancaster, William C Miller, Lance A Waller, Staci A Hepler
The opioid epidemic is a significant public health challenge in North Carolina, but limited data restrict our understanding of its complexity. Examining trends and relationships among different outcomes believed to reflect opioid misuse provides an alternative perspective to understand the opioid epidemic. We use a Bayesian dynamic spatial factor model to capture the interrelated dynamics within six different county-level outcomes, such as illicit opioid overdose deaths, emergency department visits related to drug overdose, treatment counts for opioid use disorder, patients receiving prescriptions for buprenorphine, and newly diagnosed cases of acute and chronic hepatitis C virus and human immunodeficiency virus. We design the factor model to yield meaningful interactions among predefined subsets of these outcomes, causing a departure from the conventional lower triangular structure in the loadings matrix and leading to familiar identifiability issues. To address this challenge, we propose a novel approach that involves decomposing the loadings matrix within a Markov chain Monte Carlo algorithm, allowing us to estimate the loadings and factors uniquely. As a result, we gain a better understanding of the spatio-temporal dynamics of the opioid epidemic in North Carolina.
{"title":"Understanding the opioid syndemic in North Carolina: A novel approach to modeling and identifying factors.","authors":"Eva Murphy, David Kline, Kathleen L Egan, Kathryn E Lancaster, William C Miller, Lance A Waller, Staci A Hepler","doi":"10.1093/biostatistics/kxae052","DOIUrl":"10.1093/biostatistics/kxae052","url":null,"abstract":"<p><p>The opioid epidemic is a significant public health challenge in North Carolina, but limited data restrict our understanding of its complexity. Examining trends and relationships among different outcomes believed to reflect opioid misuse provides an alternative perspective to understand the opioid epidemic. We use a Bayesian dynamic spatial factor model to capture the interrelated dynamics within six different county-level outcomes, such as illicit opioid overdose deaths, emergency department visits related to drug overdose, treatment counts for opioid use disorder, patients receiving prescriptions for buprenorphine, and newly diagnosed cases of acute and chronic hepatitis C virus and human immunodeficiency virus. We design the factor model to yield meaningful interactions among predefined subsets of these outcomes, causing a departure from the conventional lower triangular structure in the loadings matrix and leading to familiar identifiability issues. To address this challenge, we propose a novel approach that involves decomposing the loadings matrix within a Markov chain Monte Carlo algorithm, allowing us to estimate the loadings and factors uniquely. As a result, we gain a better understanding of the spatio-temporal dynamics of the opioid epidemic in North Carolina.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823283/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf004
Yixi Xu, Yi Zhao
This study introduces a mediation analysis framework when the mediator is a graph. A Gaussian covariance graph model is assumed for graph presentation. Causal estimands and assumptions are discussed under this presentation. With a covariance matrix as the mediator, a low-rank representation is introduced and parametric mediation models are considered under the structural equation modeling framework. Assuming Gaussian random errors, likelihood-based estimators are introduced to simultaneously identify the low-rank representation and causal parameters. An efficient computational algorithm is proposed and asymptotic properties of the estimators are investigated. Via simulation studies, the performance of the proposed approach is evaluated. Applying to a resting-state fMRI study, a brain network is identified within which functional connectivity mediates the sex difference in the performance of a motor task.
{"title":"Mediation analysis with graph mediator.","authors":"Yixi Xu, Yi Zhao","doi":"10.1093/biostatistics/kxaf004","DOIUrl":"10.1093/biostatistics/kxaf004","url":null,"abstract":"<p><p>This study introduces a mediation analysis framework when the mediator is a graph. A Gaussian covariance graph model is assumed for graph presentation. Causal estimands and assumptions are discussed under this presentation. With a covariance matrix as the mediator, a low-rank representation is introduced and parametric mediation models are considered under the structural equation modeling framework. Assuming Gaussian random errors, likelihood-based estimators are introduced to simultaneously identify the low-rank representation and causal parameters. An efficient computational algorithm is proposed and asymptotic properties of the estimators are investigated. Via simulation studies, the performance of the proposed approach is evaluated. Applying to a resting-state fMRI study, a brain network is identified within which functional connectivity mediates the sex difference in the performance of a motor task.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11979487/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143626882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf034
Dirk Douwes-Schultz, Alexandra M Schmidt, Laís Picinini Freitas, Marilia Sá Carvalho
Univariate zero-inflated models are increasingly being used to account for excess zeros in spatio-temporal infectious disease counts. However, the multivariate case is challenging due to the need to account for correlations across space, time and disease in both the count and zero-inflated components of the model. We are interested in comparing the transmission dynamics of several co-circulating infectious diseases across space and time, where some of the diseases can be absent for long periods. We first assume there is a baseline disease that is well-established and always present in the region. The other diseases switch between periods of presence and absence in each area through a series of coupled Markov chains, which account for long periods of disease absence, disease interactions and disease spread from neighboring areas. Since we are mainly interested in comparing the diseases, we assume the cases of the present diseases in an area jointly follow an autoregressive multinomial model. We use the multinomial model to investigate whether there are associations between certain factors, such as temperature, and differences in the transmission intensity of the diseases. Inference is performed using efficient Bayesian Markov chain Monte Carlo methods based on jointly sampling all unknown presence indicators. We apply the model to spatio-temporal counts of dengue, Zika, and chikungunya cases in Rio de Janeiro, during the first triple epidemic there.
{"title":"Markov switching zero-inflated space-time multinomial models for comparing multiple infectious diseases.","authors":"Dirk Douwes-Schultz, Alexandra M Schmidt, Laís Picinini Freitas, Marilia Sá Carvalho","doi":"10.1093/biostatistics/kxaf034","DOIUrl":"10.1093/biostatistics/kxaf034","url":null,"abstract":"<p><p>Univariate zero-inflated models are increasingly being used to account for excess zeros in spatio-temporal infectious disease counts. However, the multivariate case is challenging due to the need to account for correlations across space, time and disease in both the count and zero-inflated components of the model. We are interested in comparing the transmission dynamics of several co-circulating infectious diseases across space and time, where some of the diseases can be absent for long periods. We first assume there is a baseline disease that is well-established and always present in the region. The other diseases switch between periods of presence and absence in each area through a series of coupled Markov chains, which account for long periods of disease absence, disease interactions and disease spread from neighboring areas. Since we are mainly interested in comparing the diseases, we assume the cases of the present diseases in an area jointly follow an autoregressive multinomial model. We use the multinomial model to investigate whether there are associations between certain factors, such as temperature, and differences in the transmission intensity of the diseases. Inference is performed using efficient Bayesian Markov chain Monte Carlo methods based on jointly sampling all unknown presence indicators. We apply the model to spatio-temporal counts of dengue, Zika, and chikungunya cases in Rio de Janeiro, during the first triple epidemic there.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12596980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145483585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf033
Emily Somerset, Justin J Slater, Patrick E Brown
We introduce a hierarchical Bayesian framework for reconstructing epidemic curves using under-reported case counts and wastewater data. Our approach models wastewater signals as differentiable Gaussian processes, enabling inference on their relative growth rates, which are used to define a wastewater-based reproduction rate. These estimates are incorporated into a binomially thinned Poisson autoregressive model for case counts using a modular inference strategy. We apply this framework to reconstruct the Covid-19 epidemic curve in Toronto, validating our model through out-of-sample forecasts and comparisons with independent serosurvey-based cumulative incidence estimates. We also apply the framework to New Zealand's Covid-19 data to reconstruct its epidemic curve and demonstrate improvements over an existing joint model for wastewater and case data. A key advantage of our framework, highlighted in this comparison, is that it does not rely on pre-specified constant parameters, allowing the model to better adapt to evolving pandemic conditions.
{"title":"Wastewater-based reproduction rates for epidemic curve reconstruction.","authors":"Emily Somerset, Justin J Slater, Patrick E Brown","doi":"10.1093/biostatistics/kxaf033","DOIUrl":"10.1093/biostatistics/kxaf033","url":null,"abstract":"<p><p>We introduce a hierarchical Bayesian framework for reconstructing epidemic curves using under-reported case counts and wastewater data. Our approach models wastewater signals as differentiable Gaussian processes, enabling inference on their relative growth rates, which are used to define a wastewater-based reproduction rate. These estimates are incorporated into a binomially thinned Poisson autoregressive model for case counts using a modular inference strategy. We apply this framework to reconstruct the Covid-19 epidemic curve in Toronto, validating our model through out-of-sample forecasts and comparisons with independent serosurvey-based cumulative incidence estimates. We also apply the framework to New Zealand's Covid-19 data to reconstruct its epidemic curve and demonstrate improvements over an existing joint model for wastewater and case data. A key advantage of our framework, highlighted in this comparison, is that it does not rely on pre-specified constant parameters, allowing the model to better adapt to evolving pandemic conditions.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12533577/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145314108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae040
Álvaro Méndez-Civieta, Ying Wei, Keith M Diaz, Jeff Goldsmith
This paper introduces functional quantile principal component analysis (FQPCA), a dimensionality reduction technique that extends the concept of functional principal components analysis (FPCA) to the examination of participant-specific quantiles curves. Our approach borrows strength across participants to estimate patterns in quantiles, and uses participant-level data to estimate loadings on those patterns. As a result, FQPCA is able to capture shifts in the scale and distribution of data that affect participant-level quantile curves, and is also a robust methodology suitable for dealing with outliers, heteroscedastic data or skewed data. The need for such methodology is exemplified by physical activity data collected using wearable devices. Participants often differ in the timing and intensity of physical activity behaviors, and capturing information beyond the participant-level expected value curves produced by FPCA is necessary for a robust quantification of diurnal patterns of activity. We illustrate our methods using accelerometer data from the National Health and Nutrition Examination Survey, and produce participant-level 10%, 50%, and 90% quantile curves over 24 h of activity. The proposed methodology is supported by simulation results, and is available as an R package.
{"title":"Functional quantile principal component analysis.","authors":"Álvaro Méndez-Civieta, Ying Wei, Keith M Diaz, Jeff Goldsmith","doi":"10.1093/biostatistics/kxae040","DOIUrl":"10.1093/biostatistics/kxae040","url":null,"abstract":"<p><p>This paper introduces functional quantile principal component analysis (FQPCA), a dimensionality reduction technique that extends the concept of functional principal components analysis (FPCA) to the examination of participant-specific quantiles curves. Our approach borrows strength across participants to estimate patterns in quantiles, and uses participant-level data to estimate loadings on those patterns. As a result, FQPCA is able to capture shifts in the scale and distribution of data that affect participant-level quantile curves, and is also a robust methodology suitable for dealing with outliers, heteroscedastic data or skewed data. The need for such methodology is exemplified by physical activity data collected using wearable devices. Participants often differ in the timing and intensity of physical activity behaviors, and capturing information beyond the participant-level expected value curves produced by FPCA is necessary for a robust quantification of diurnal patterns of activity. We illustrate our methods using accelerometer data from the National Health and Nutrition Examination Survey, and produce participant-level 10%, 50%, and 90% quantile curves over 24 h of activity. The proposed methodology is supported by simulation results, and is available as an R package.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823270/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}