Pub Date : 2025-12-31DOI: 10.1093/biostatistics/kxaf009
Ariel Chao, Donna Spiegelman, Ashley Buchanan, Laura Forastiere
To leverage peer influence and increase population behavioral changes, behavioral interventions often rely on peer-based strategies. A common study design that assesses such strategies is the egocentric-network randomized trial (ENRT), where index participants receive a behavioral training and are encouraged to disseminate information to their peers. Under this design, a crucial estimand of interest is the Average Spillover Effect (ASpE), which measures the impact of the intervention on participants who do not receive it, but whose outcomes may be affected by others who do. The assessment of the ASpE relies on assumptions about, and correct measurement of, interference sets within which individuals may influence one another's outcomes. It can be challenging to properly specify interference sets, such as networks in ENRTs, and when mismeasured, intervention effects estimated by existing methods will be biased. In studies where social networks play an important role in disease transmission or behavior change, correcting ASpE estimates for bias due to network misclassification is critical for accurately evaluating the full impact of interventions. We combined measurement error and causal inference methods to bias-correct the ASpE estimate for network misclassification in ENRTs, when surrogate networks are recorded in place of true ones, and validation data that relate the misclassified to the true networks are available. We investigated finite sample properties of our methods in an extensive simulation study and illustrated our methods in the HIV Prevention Trials Network (HPTN) 037 study.
{"title":"Estimation and inference for causal spillover effects in egocentric-network randomized trials in the presence of network membership misclassification.","authors":"Ariel Chao, Donna Spiegelman, Ashley Buchanan, Laura Forastiere","doi":"10.1093/biostatistics/kxaf009","DOIUrl":"10.1093/biostatistics/kxaf009","url":null,"abstract":"<p><p>To leverage peer influence and increase population behavioral changes, behavioral interventions often rely on peer-based strategies. A common study design that assesses such strategies is the egocentric-network randomized trial (ENRT), where index participants receive a behavioral training and are encouraged to disseminate information to their peers. Under this design, a crucial estimand of interest is the Average Spillover Effect (ASpE), which measures the impact of the intervention on participants who do not receive it, but whose outcomes may be affected by others who do. The assessment of the ASpE relies on assumptions about, and correct measurement of, interference sets within which individuals may influence one another's outcomes. It can be challenging to properly specify interference sets, such as networks in ENRTs, and when mismeasured, intervention effects estimated by existing methods will be biased. In studies where social networks play an important role in disease transmission or behavior change, correcting ASpE estimates for bias due to network misclassification is critical for accurately evaluating the full impact of interventions. We combined measurement error and causal inference methods to bias-correct the ASpE estimate for network misclassification in ENRTs, when surrogate networks are recorded in place of true ones, and validation data that relate the misclassified to the true networks are available. We investigated finite sample properties of our methods in an extensive simulation study and illustrated our methods in the HIV Prevention Trials Network (HPTN) 037 study.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11955068/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143755648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-31DOI: 10.1093/biostatistics/kxaf011
Iuliana Ciocănea-Teodorescu, Erin E Gabriel, Arvid Sjölander
For a comprehensive understanding of the effect of a given treatment on an outcome of interest, quantification of individual treatment heterogeneity is essential, alongside estimation of the average causal effect. However, even in randomized controlled trials, quantities such as the probability of benefit or the probability of harm are not identifiable, since multiple potential outcomes cannot be observed simultaneously for the same individual. We propose a sensitivity analysis for the probability of benefit in randomized controlled trial settings with a binary treatment and a binary outcome, by quantifying the deviation from conditional independence of the two potential outcomes, given a set of measured prognostic baseline covariates. We do this using a marginal sensitivity analysis parameter that does not depend on the number or complexity of the measured covariates. We provide a guide to estimation and interpretation, and illustrate our method in simulations, as well as using a real data example from a randomized controlled trial studying the effect of umbilical vein oxytocin administration on the need for manual removal of the placenta during birth.
{"title":"Sensitivity analysis for the probability of benefit in randomized controlled trials with a binary treatment and a binary outcome.","authors":"Iuliana Ciocănea-Teodorescu, Erin E Gabriel, Arvid Sjölander","doi":"10.1093/biostatistics/kxaf011","DOIUrl":"10.1093/biostatistics/kxaf011","url":null,"abstract":"<p><p>For a comprehensive understanding of the effect of a given treatment on an outcome of interest, quantification of individual treatment heterogeneity is essential, alongside estimation of the average causal effect. However, even in randomized controlled trials, quantities such as the probability of benefit or the probability of harm are not identifiable, since multiple potential outcomes cannot be observed simultaneously for the same individual. We propose a sensitivity analysis for the probability of benefit in randomized controlled trial settings with a binary treatment and a binary outcome, by quantifying the deviation from conditional independence of the two potential outcomes, given a set of measured prognostic baseline covariates. We do this using a marginal sensitivity analysis parameter that does not depend on the number or complexity of the measured covariates. We provide a guide to estimation and interpretation, and illustrate our method in simulations, as well as using a real data example from a randomized controlled trial studying the effect of umbilical vein oxytocin administration on the need for manual removal of the placenta during birth.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12129078/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144210358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-31DOI: 10.1093/biostatistics/kxaf036
Katia Colaneri, Camilla Damian, Rüdiger Frey
In this paper, we consider a discrete-time stochastic SIR model, where the transmission rate and the number of infectious individuals are random and unobservable. This model accounts for random fluctuations in infectiousness and for non-detected infections. Thus, statistical inference has to be performed in a partial information setting. We adopt a Bayesian approach and use nested particle filtering to estimate the state of the system and the parameters. Moreover, we discuss forecasts and model tests based on the posterior predictive distribution. As a case study, we apply our methodology to Austrian Covid-19 infection data.
{"title":"A filtering approach for statistical inference in a stochastic SIR model with an application to Covid-19 data.","authors":"Katia Colaneri, Camilla Damian, Rüdiger Frey","doi":"10.1093/biostatistics/kxaf036","DOIUrl":"10.1093/biostatistics/kxaf036","url":null,"abstract":"<p><p>In this paper, we consider a discrete-time stochastic SIR model, where the transmission rate and the number of infectious individuals are random and unobservable. This model accounts for random fluctuations in infectiousness and for non-detected infections. Thus, statistical inference has to be performed in a partial information setting. We adopt a Bayesian approach and use nested particle filtering to estimate the state of the system and the parameters. Moreover, we discuss forecasts and model tests based on the posterior predictive distribution. As a case study, we apply our methodology to Austrian Covid-19 infection data.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12554006/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145373268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-31DOI: 10.1093/biostatistics/kxaf045
Zhe Chen, Siyu Heng, Asa Tapley, Stephen De Rosa, Bo Zhang
A key objective in vaccine studies is to evaluate vaccine-induced immunogenicity and determine whether participants have mounted a response to the vaccine. Cellular immune responses are essential for assessing vaccine-induced immunogenicity, and single-cell assays, such as intracellular cytokine staining (ICS) and B-cell phenotyping (BCP), are commonly employed to profile individual immune cell phenotypes and the cytokines they produce after stimulation. In this article, we introduce a novel statistical framework for identifying vaccine responders using ICS data collected before and after vaccination. This framework incorporates paired control data to account for potential unintended variations between assay runs, such as batch effects, that could lead to misclassification of participants as vaccine responders or non-responders. To formally integrate paired control data for accounting for assay variation across different time points (ie before and after vaccination), our proposed framework calculates and reports two $ P $-values, both adjusting for paired control data but in distinct ways: (i) the maximally adjusted $ P $-value, which applies the most conservative adjustment to the unadjusted $ P $-value, ensuring validity over all plausible batch effects consistent with the paired control samples' data, and (ii) the minimally adjusted $ P $-value, which imposes only the minimal adjustment to the unadjusted $ P $-value, such that the adjusted $ P $-value cannot be falsified by the paired control samples' data. Minimally and maximally adjusted $ P $-values offer a balanced approach to managing Type I error rates and statistical power in the presence of batch effects. We apply this framework to analyze ICS data collected at baseline and 4 wks post-vaccination from the COVID-19 Prevention Network (CoVPN) 3008 study. Our analysis helps address two clinical questions: (i) which participants exhibited evidence of an incident Omicron infection between baseline and 4 wks after receiving the final dose of the primary vaccination series, and (ii) which participants showed vaccine-induced T cell responses against the Omicron BA.4/5 Spike protein.
疫苗研究的一个关键目标是评估疫苗诱导的免疫原性,并确定参与者是否对疫苗产生了反应。细胞免疫应答对于评估疫苗诱导的免疫原性至关重要,单细胞试验,如细胞内细胞因子染色(ICS)和b细胞表型(BCP),通常用于分析个体免疫细胞表型及其在刺激后产生的细胞因子。在本文中,我们介绍了一种新的统计框架,用于使用接种前后收集的ICS数据来识别疫苗应答者。该框架纳入了成对对照数据,以解释分析运行之间潜在的意外变化,例如批量效应,这可能导致将参与者错误分类为疫苗应答者或无应答者。为了正式整合成对对照数据,以解释不同时间点(即接种疫苗之前和之后)的检测变化,我们提出的框架计算并报告两个P值,它们都对成对对照数据进行了调整,但方式不同:(i)最大调整的$ P $值,它对未调整的$ P $值应用最保守的调整,确保与成对对照样本数据一致的所有似是而非的批效应的有效性;(ii)最小调整的$ P $值,它只对未调整的$ P $值施加最小的调整,这样调整后的$ P $值就不会被成对对照样本的数据伪造。最小和最大调整的$ P $值提供了一种平衡的方法来管理第一类错误率和存在批处理效应的统计能力。我们应用这一框架分析了COVID-19预防网络(CoVPN) 3008研究在基线和接种疫苗后4周收集的ICS数据。我们的分析有助于解决两个临床问题:(i)哪些参与者在接受一次疫苗系列的最后剂量后的基线和4周之间表现出意外的Omicron感染的证据,以及(ii)哪些参与者表现出疫苗诱导的针对Omicron BA.4/5刺突蛋白的T细胞反应。
{"title":"Determining vaccine responders in the presence of baseline immunity using single-cell assays and paired control samples.","authors":"Zhe Chen, Siyu Heng, Asa Tapley, Stephen De Rosa, Bo Zhang","doi":"10.1093/biostatistics/kxaf045","DOIUrl":"10.1093/biostatistics/kxaf045","url":null,"abstract":"<p><p>A key objective in vaccine studies is to evaluate vaccine-induced immunogenicity and determine whether participants have mounted a response to the vaccine. Cellular immune responses are essential for assessing vaccine-induced immunogenicity, and single-cell assays, such as intracellular cytokine staining (ICS) and B-cell phenotyping (BCP), are commonly employed to profile individual immune cell phenotypes and the cytokines they produce after stimulation. In this article, we introduce a novel statistical framework for identifying vaccine responders using ICS data collected before and after vaccination. This framework incorporates paired control data to account for potential unintended variations between assay runs, such as batch effects, that could lead to misclassification of participants as vaccine responders or non-responders. To formally integrate paired control data for accounting for assay variation across different time points (ie before and after vaccination), our proposed framework calculates and reports two $ P $-values, both adjusting for paired control data but in distinct ways: (i) the maximally adjusted $ P $-value, which applies the most conservative adjustment to the unadjusted $ P $-value, ensuring validity over all plausible batch effects consistent with the paired control samples' data, and (ii) the minimally adjusted $ P $-value, which imposes only the minimal adjustment to the unadjusted $ P $-value, such that the adjusted $ P $-value cannot be falsified by the paired control samples' data. Minimally and maximally adjusted $ P $-values offer a balanced approach to managing Type I error rates and statistical power in the presence of batch effects. We apply this framework to analyze ICS data collected at baseline and 4 wks post-vaccination from the COVID-19 Prevention Network (CoVPN) 3008 study. Our analysis helps address two clinical questions: (i) which participants exhibited evidence of an incident Omicron infection between baseline and 4 wks after receiving the final dose of the primary vaccination series, and (ii) which participants showed vaccine-induced T cell responses against the Omicron BA.4/5 Spike protein.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-31DOI: 10.1093/biostatistics/kxaf044
Sang Kyu Lee, Seonjin Kim, Mi-Ok Kim, Katherine L Grantz, Hyokyoung G Hong
Addressing health disparities across demographic groups remains a critical challenge in public health, with significant gaps in understanding how these disparities evolve over time. This paper extends the traditional Peters-Belson decomposition to a longitudinal setting, focusing on the role of a single explanatory variable, referred to as a modifier, that captures complex interactions with other covariates. The proposed method partitions disparities into 3 components: (i) the portion associated with differences in the conditional distribution of covariates, evaluated under a common distribution of the modifier across groups; (ii) the portion arising from differences in the distribution of the modifier and its interactions with other covariates; and (iii) the unexplained disparity not accounted for by observed covariates. Rather than aggregating the first 2 components into one "explained disparity," the proposed method allows for a separate characterization of temporal patterns in disparities, distinguishing those that are unassociated with the modifier from those that are associated with it. We illustrate the method using a fetal growth study, examining disparities in fetal development trajectories across racial and ethnic groups during pregnancy.
{"title":"Decomposition of longitudinal disparities: an application to the fetal growth-singletons study.","authors":"Sang Kyu Lee, Seonjin Kim, Mi-Ok Kim, Katherine L Grantz, Hyokyoung G Hong","doi":"10.1093/biostatistics/kxaf044","DOIUrl":"10.1093/biostatistics/kxaf044","url":null,"abstract":"<p><p>Addressing health disparities across demographic groups remains a critical challenge in public health, with significant gaps in understanding how these disparities evolve over time. This paper extends the traditional Peters-Belson decomposition to a longitudinal setting, focusing on the role of a single explanatory variable, referred to as a modifier, that captures complex interactions with other covariates. The proposed method partitions disparities into 3 components: (i) the portion associated with differences in the conditional distribution of covariates, evaluated under a common distribution of the modifier across groups; (ii) the portion arising from differences in the distribution of the modifier and its interactions with other covariates; and (iii) the unexplained disparity not accounted for by observed covariates. Rather than aggregating the first 2 components into one \"explained disparity,\" the proposed method allows for a separate characterization of temporal patterns in disparities, distinguishing those that are unassociated with the modifier from those that are associated with it. We illustrate the method using a fetal growth study, examining disparities in fetal development trajectories across racial and ethnic groups during pregnancy.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12701353/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145744404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-31DOI: 10.1093/biostatistics/kxae040
Álvaro Méndez-Civieta, Ying Wei, Keith M Diaz, Jeff Goldsmith
This paper introduces functional quantile principal component analysis (FQPCA), a dimensionality reduction technique that extends the concept of functional principal components analysis (FPCA) to the examination of participant-specific quantiles curves. Our approach borrows strength across participants to estimate patterns in quantiles, and uses participant-level data to estimate loadings on those patterns. As a result, FQPCA is able to capture shifts in the scale and distribution of data that affect participant-level quantile curves, and is also a robust methodology suitable for dealing with outliers, heteroscedastic data or skewed data. The need for such methodology is exemplified by physical activity data collected using wearable devices. Participants often differ in the timing and intensity of physical activity behaviors, and capturing information beyond the participant-level expected value curves produced by FPCA is necessary for a robust quantification of diurnal patterns of activity. We illustrate our methods using accelerometer data from the National Health and Nutrition Examination Survey, and produce participant-level 10%, 50%, and 90% quantile curves over 24 h of activity. The proposed methodology is supported by simulation results, and is available as an R package.
{"title":"Functional quantile principal component analysis.","authors":"Álvaro Méndez-Civieta, Ying Wei, Keith M Diaz, Jeff Goldsmith","doi":"10.1093/biostatistics/kxae040","DOIUrl":"10.1093/biostatistics/kxae040","url":null,"abstract":"<p><p>This paper introduces functional quantile principal component analysis (FQPCA), a dimensionality reduction technique that extends the concept of functional principal components analysis (FPCA) to the examination of participant-specific quantiles curves. Our approach borrows strength across participants to estimate patterns in quantiles, and uses participant-level data to estimate loadings on those patterns. As a result, FQPCA is able to capture shifts in the scale and distribution of data that affect participant-level quantile curves, and is also a robust methodology suitable for dealing with outliers, heteroscedastic data or skewed data. The need for such methodology is exemplified by physical activity data collected using wearable devices. Participants often differ in the timing and intensity of physical activity behaviors, and capturing information beyond the participant-level expected value curves produced by FPCA is necessary for a robust quantification of diurnal patterns of activity. We illustrate our methods using accelerometer data from the National Health and Nutrition Examination Survey, and produce participant-level 10%, 50%, and 90% quantile curves over 24 h of activity. The proposed methodology is supported by simulation results, and is available as an R package.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823270/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-31DOI: 10.1093/biostatistics/kxae052
Eva Murphy, David Kline, Kathleen L Egan, Kathryn E Lancaster, William C Miller, Lance A Waller, Staci A Hepler
The opioid epidemic is a significant public health challenge in North Carolina, but limited data restrict our understanding of its complexity. Examining trends and relationships among different outcomes believed to reflect opioid misuse provides an alternative perspective to understand the opioid epidemic. We use a Bayesian dynamic spatial factor model to capture the interrelated dynamics within six different county-level outcomes, such as illicit opioid overdose deaths, emergency department visits related to drug overdose, treatment counts for opioid use disorder, patients receiving prescriptions for buprenorphine, and newly diagnosed cases of acute and chronic hepatitis C virus and human immunodeficiency virus. We design the factor model to yield meaningful interactions among predefined subsets of these outcomes, causing a departure from the conventional lower triangular structure in the loadings matrix and leading to familiar identifiability issues. To address this challenge, we propose a novel approach that involves decomposing the loadings matrix within a Markov chain Monte Carlo algorithm, allowing us to estimate the loadings and factors uniquely. As a result, we gain a better understanding of the spatio-temporal dynamics of the opioid epidemic in North Carolina.
{"title":"Understanding the opioid syndemic in North Carolina: A novel approach to modeling and identifying factors.","authors":"Eva Murphy, David Kline, Kathleen L Egan, Kathryn E Lancaster, William C Miller, Lance A Waller, Staci A Hepler","doi":"10.1093/biostatistics/kxae052","DOIUrl":"10.1093/biostatistics/kxae052","url":null,"abstract":"<p><p>The opioid epidemic is a significant public health challenge in North Carolina, but limited data restrict our understanding of its complexity. Examining trends and relationships among different outcomes believed to reflect opioid misuse provides an alternative perspective to understand the opioid epidemic. We use a Bayesian dynamic spatial factor model to capture the interrelated dynamics within six different county-level outcomes, such as illicit opioid overdose deaths, emergency department visits related to drug overdose, treatment counts for opioid use disorder, patients receiving prescriptions for buprenorphine, and newly diagnosed cases of acute and chronic hepatitis C virus and human immunodeficiency virus. We design the factor model to yield meaningful interactions among predefined subsets of these outcomes, causing a departure from the conventional lower triangular structure in the loadings matrix and leading to familiar identifiability issues. To address this challenge, we propose a novel approach that involves decomposing the loadings matrix within a Markov chain Monte Carlo algorithm, allowing us to estimate the loadings and factors uniquely. As a result, we gain a better understanding of the spatio-temporal dynamics of the opioid epidemic in North Carolina.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823283/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-31DOI: 10.1093/biostatistics/kxaf010
Yizhen Xu, Scott Zeger, Zheyu Wang
The preclinical stage of many neurodegenerative diseases can span decades before symptoms become apparent. Understanding the sequence of preclinical biomarker changes provides a critical opportunity for early diagnosis and effective intervention prior to significant loss of patients' brain functions. The main challenge to early detection lies in the absence of direct observation of the disease state and the considerable variability in both biomarkers and disease dynamics among individuals. Recent research hypothesized the existence of subgroups with distinct biomarker patterns due to co-morbidities and degrees of brain resilience. Our ability to diagnose early and intervene during the preclinical stage of neurodegenerative diseases will be enhanced by further insights into heterogeneity in the biomarker-disease relationship. In this article, we focus on Alzheimer's disease (AD) and attempt to identify the systematic patterns within the heterogeneous AD biomarker-disease cascade. Specifically, we quantify the disease progression using a dynamic latent variable whose mixture distribution represents patient subgroups. Model estimation uses Hamiltonian Monte Carlo with the number of clusters determined by the Bayesian Information Criterion. We report simulation studies that investigate the performance of the proposed model in finite sample settings that are similar to our motivating application. We apply the proposed model to the Biomarkers of Cognitive Decline Among Normal Individuals data, a longitudinal study that was conducted over 2 decades among individuals who were initially cognitively normal. Our application yields evidence consistent with the hypothetical model of biomarker dynamics presented in Jack Jr et al. In addition, our analysis identified 2 subgroups with distinct disease-onset patterns. Finally, we develop a dynamic prediction approach to improve the precision of prognoses.
{"title":"Probabilistic clustering using shared latent variable model for assessing Alzheimer's disease biomarkers.","authors":"Yizhen Xu, Scott Zeger, Zheyu Wang","doi":"10.1093/biostatistics/kxaf010","DOIUrl":"10.1093/biostatistics/kxaf010","url":null,"abstract":"<p><p>The preclinical stage of many neurodegenerative diseases can span decades before symptoms become apparent. Understanding the sequence of preclinical biomarker changes provides a critical opportunity for early diagnosis and effective intervention prior to significant loss of patients' brain functions. The main challenge to early detection lies in the absence of direct observation of the disease state and the considerable variability in both biomarkers and disease dynamics among individuals. Recent research hypothesized the existence of subgroups with distinct biomarker patterns due to co-morbidities and degrees of brain resilience. Our ability to diagnose early and intervene during the preclinical stage of neurodegenerative diseases will be enhanced by further insights into heterogeneity in the biomarker-disease relationship. In this article, we focus on Alzheimer's disease (AD) and attempt to identify the systematic patterns within the heterogeneous AD biomarker-disease cascade. Specifically, we quantify the disease progression using a dynamic latent variable whose mixture distribution represents patient subgroups. Model estimation uses Hamiltonian Monte Carlo with the number of clusters determined by the Bayesian Information Criterion. We report simulation studies that investigate the performance of the proposed model in finite sample settings that are similar to our motivating application. We apply the proposed model to the Biomarkers of Cognitive Decline Among Normal Individuals data, a longitudinal study that was conducted over 2 decades among individuals who were initially cognitively normal. Our application yields evidence consistent with the hypothetical model of biomarker dynamics presented in Jack Jr et al. In addition, our analysis identified 2 subgroups with distinct disease-onset patterns. Finally, we develop a dynamic prediction approach to improve the precision of prognoses.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12054513/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144029768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-31DOI: 10.1093/biostatistics/kxaf017
Danielle Demateis, Sandra India-Aldana, Robert O Wright, Rosalind J Wright, Andrea Baccarelli, Elena Colicino, Ander Wilson, Kayleigh P Keller
Epidemiological evidence supports an association between exposure to air pollution during pregnancy and birth and child health outcomes. Typically, such associations are estimated by regressing an outcome on daily or weekly measures of exposure during pregnancy using a distributed lag model. However, these associations may be modified by multiple factors. We propose a distributed lag interaction model with index modification that allows for effect modification of a functional predictor by a weighted average of multiple modifiers. Our model allows for simultaneous estimation of modifier index weights and the exposure-time-response function via a spline cross-basis in a Bayesian hierarchical framework. Through simulations, we showed that our model out-performs competing methods when there are multiple modifiers of unknown importance. We applied our proposed method to a Colorado birth cohort to estimate the association between birth weight and air pollution modified by a neighborhood-vulnerability index and to a Mexican birth cohort to estimate the association between birthing-parent cardio-metabolic endpoints and air pollution modified by a birthing-parent lifetime stress index.
{"title":"Distributed lag interaction model with index modification.","authors":"Danielle Demateis, Sandra India-Aldana, Robert O Wright, Rosalind J Wright, Andrea Baccarelli, Elena Colicino, Ander Wilson, Kayleigh P Keller","doi":"10.1093/biostatistics/kxaf017","DOIUrl":"10.1093/biostatistics/kxaf017","url":null,"abstract":"<p><p>Epidemiological evidence supports an association between exposure to air pollution during pregnancy and birth and child health outcomes. Typically, such associations are estimated by regressing an outcome on daily or weekly measures of exposure during pregnancy using a distributed lag model. However, these associations may be modified by multiple factors. We propose a distributed lag interaction model with index modification that allows for effect modification of a functional predictor by a weighted average of multiple modifiers. Our model allows for simultaneous estimation of modifier index weights and the exposure-time-response function via a spline cross-basis in a Bayesian hierarchical framework. Through simulations, we showed that our model out-performs competing methods when there are multiple modifiers of unknown importance. We applied our proposed method to a Colorado birth cohort to estimate the association between birth weight and air pollution modified by a neighborhood-vulnerability index and to a Mexican birth cohort to estimate the association between birthing-parent cardio-metabolic endpoints and air pollution modified by a birthing-parent lifetime stress index.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12205949/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144369549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-31DOI: 10.1093/biostatistics/kxaf012
Kinnary Shah, Boyi Guo, Stephanie C Hicks
An important task in the analysis of spatially resolved transcriptomics (SRT) data is to identify spatially variable genes (SVGs), or genes that vary in a 2D space. Current approaches rank SVGs based on either $ P $-values or an effect size, such as the proportion of spatial variance. However, previous work in the analysis of RNA-sequencing data identified a technical bias with log-transformation, violating the "mean-variance relationship" of gene counts, where highly expressed genes are more likely to have a higher variance in counts but lower variance after log-transformation. Here, we demonstrate the mean-variance relationship in SRT data. Furthermore, we propose spoon, a statistical framework using empirical Bayes techniques to remove this bias, leading to more accurate prioritization of SVGs. We demonstrate the performance of spoon in both simulated and real SRT data. A software implementation of our method is available at https://bioconductor.org/packages/spoon.
{"title":"Addressing the mean-variance relationship in spatially resolved transcriptomics data with spoon.","authors":"Kinnary Shah, Boyi Guo, Stephanie C Hicks","doi":"10.1093/biostatistics/kxaf012","DOIUrl":"10.1093/biostatistics/kxaf012","url":null,"abstract":"<p><p>An important task in the analysis of spatially resolved transcriptomics (SRT) data is to identify spatially variable genes (SVGs), or genes that vary in a 2D space. Current approaches rank SVGs based on either $ P $-values or an effect size, such as the proportion of spatial variance. However, previous work in the analysis of RNA-sequencing data identified a technical bias with log-transformation, violating the \"mean-variance relationship\" of gene counts, where highly expressed genes are more likely to have a higher variance in counts but lower variance after log-transformation. Here, we demonstrate the mean-variance relationship in SRT data. Furthermore, we propose spoon, a statistical framework using empirical Bayes techniques to remove this bias, leading to more accurate prioritization of SVGs. We demonstrate the performance of spoon in both simulated and real SRT data. A software implementation of our method is available at https://bioconductor.org/packages/spoon.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12166475/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144295418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}