Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae053
Yingfa Xie, Haoda Fu, Yuan Huang, Vladimir Pozdnyakov, Jun Yan
Patients with type 2 diabetes need to closely monitor blood sugar levels as their routine diabetes self-management. Although many treatment agents aim to tightly control blood sugar, hypoglycemia often stands as an adverse event. In practice, patients can observe hypoglycemic events more easily than hyperglycemic events due to the perception of neurogenic symptoms. We propose to model each patient's observed hypoglycemic event as a lower boundary crossing event for a reflected Brownian motion with an upper reflection barrier. The lower boundary is set by clinical standards. To capture patient heterogeneity and within-patient dependence, covariates and a patient level frailty are incorporated into the volatility and the upper reflection barrier. This framework provides quantification for the underlying glucose level variability, patients heterogeneity, and risk factors' impact on glucose. We make inferences based on a Bayesian framework using Markov chain Monte Carlo. Two model comparison criteria, the deviance information criterion and the logarithm of the pseudo-marginal likelihood, are used for model selection. The methodology is validated in simulation studies. In analyzing a dataset from the diabetic patients in the DURABLE trial, our model provides adequate fit, generates data similar to the observed data, and offers insights that could be missed by other models.
{"title":"Recurrent events modeling based on a reflected Brownian motion with application to hypoglycemia.","authors":"Yingfa Xie, Haoda Fu, Yuan Huang, Vladimir Pozdnyakov, Jun Yan","doi":"10.1093/biostatistics/kxae053","DOIUrl":"10.1093/biostatistics/kxae053","url":null,"abstract":"<p><p>Patients with type 2 diabetes need to closely monitor blood sugar levels as their routine diabetes self-management. Although many treatment agents aim to tightly control blood sugar, hypoglycemia often stands as an adverse event. In practice, patients can observe hypoglycemic events more easily than hyperglycemic events due to the perception of neurogenic symptoms. We propose to model each patient's observed hypoglycemic event as a lower boundary crossing event for a reflected Brownian motion with an upper reflection barrier. The lower boundary is set by clinical standards. To capture patient heterogeneity and within-patient dependence, covariates and a patient level frailty are incorporated into the volatility and the upper reflection barrier. This framework provides quantification for the underlying glucose level variability, patients heterogeneity, and risk factors' impact on glucose. We make inferences based on a Bayesian framework using Markov chain Monte Carlo. Two model comparison criteria, the deviance information criterion and the logarithm of the pseudo-marginal likelihood, are used for model selection. The methodology is validated in simulation studies. In analyzing a dataset from the diabetic patients in the DURABLE trial, our model provides adequate fit, generates data similar to the observed data, and offers insights that could be missed by other models.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae046
Yiqun T Chen, Lucy L Gao
For many applications, it is critical to interpret and validate groups of observations obtained via clustering. A common interpretation and validation approach involves testing differences in feature means between observations in two estimated clusters. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we propose a new test for the difference in means in a single feature between a pair of clusters obtained using hierarchical or k-means clustering. The test controls the selective Type I error rate in finite samples and can be efficiently computed. We further illustrate the validity and power of our proposal in simulation and demonstrate its use on single-cell RNA-sequencing data.
{"title":"Testing for a difference in means of a single feature after clustering.","authors":"Yiqun T Chen, Lucy L Gao","doi":"10.1093/biostatistics/kxae046","DOIUrl":"10.1093/biostatistics/kxae046","url":null,"abstract":"<p><p>For many applications, it is critical to interpret and validate groups of observations obtained via clustering. A common interpretation and validation approach involves testing differences in feature means between observations in two estimated clusters. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we propose a new test for the difference in means in a single feature between a pair of clusters obtained using hierarchical or k-means clustering. The test controls the selective Type I error rate in finite samples and can be efficiently computed. We further illustrate the validity and power of our proposal in simulation and demonstrate its use on single-cell RNA-sequencing data.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11687323/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142911253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf019
Yi Zhao, Xi Luo, Michael E Sobel, Martin A Lindquist, Brian S Caffo
A primary goal of task-based functional magnetic resonance imaging (fMRI) studies is to quantify the effective connectivity between brain regions when stimuli are presented. Assessing the dynamics of effective connectivity has attracted increasing attention. Causal mediation analysis serves as a widely implemented tool aiming to delineate the mechanism between task stimuli and brain activations. However, the case, where the treatment, mediator, and outcome are continuous functions, has not been studied. Causal mediation analysis for functional data is considered. Semiparametric functional linear structural equation models are introduced and causal assumptions are discussed. The proposed models allow for the estimation of individual effect curves. The models are applied to a task-based fMRI study, providing a new perspective of studying dynamic brain connectivity. The R package cfma for implementation is available on CRAN.
{"title":"Causal functional mediation analysis with an application to functional magnetic resonance imaging data.","authors":"Yi Zhao, Xi Luo, Michael E Sobel, Martin A Lindquist, Brian S Caffo","doi":"10.1093/biostatistics/kxaf019","DOIUrl":"10.1093/biostatistics/kxaf019","url":null,"abstract":"<p><p>A primary goal of task-based functional magnetic resonance imaging (fMRI) studies is to quantify the effective connectivity between brain regions when stimuli are presented. Assessing the dynamics of effective connectivity has attracted increasing attention. Causal mediation analysis serves as a widely implemented tool aiming to delineate the mechanism between task stimuli and brain activations. However, the case, where the treatment, mediator, and outcome are continuous functions, has not been studied. Causal mediation analysis for functional data is considered. Semiparametric functional linear structural equation models are introduced and causal assumptions are discussed. The proposed models allow for the estimation of individual effect curves. The models are applied to a task-based fMRI study, providing a new perspective of studying dynamic brain connectivity. The R package cfma for implementation is available on CRAN.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12206356/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144531230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae020
Wei Zong, Danyang Li, Marianne L Seney, Colleen A Mcclung, George C Tseng
High-dimensional omics data often contain intricate and multifaceted information, resulting in the coexistence of multiple plausible sample partitions based on different subsets of selected features. Conventional clustering methods typically yield only one clustering solution, limiting their capacity to fully capture all facets of cluster structures in high-dimensional data. To address this challenge, we propose a model-based multifacet clustering (MFClust) method based on a mixture of Gaussian mixture models, where the former mixture achieves facet assignment for gene features and the latter mixture determines cluster assignment of samples. We demonstrate superior facet and cluster assignment accuracy of MFClust through simulation studies. The proposed method is applied to three transcriptomic applications from postmortem brain and lung disease studies. The result captures multifacet clustering structures associated with critical clinical variables and provides intriguing biological insights for further hypothesis generation and discovery.
{"title":"Model-based multifacet clustering with high-dimensional omics applications.","authors":"Wei Zong, Danyang Li, Marianne L Seney, Colleen A Mcclung, George C Tseng","doi":"10.1093/biostatistics/kxae020","DOIUrl":"10.1093/biostatistics/kxae020","url":null,"abstract":"<p><p>High-dimensional omics data often contain intricate and multifaceted information, resulting in the coexistence of multiple plausible sample partitions based on different subsets of selected features. Conventional clustering methods typically yield only one clustering solution, limiting their capacity to fully capture all facets of cluster structures in high-dimensional data. To address this challenge, we propose a model-based multifacet clustering (MFClust) method based on a mixture of Gaussian mixture models, where the former mixture achieves facet assignment for gene features and the latter mixture determines cluster assignment of samples. We demonstrate superior facet and cluster assignment accuracy of MFClust through simulation studies. The proposed method is applied to three transcriptomic applications from postmortem brain and lung disease studies. The result captures multifacet clustering structures associated with critical clinical variables and provides intriguing biological insights for further hypothesis generation and discovery.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823124/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf028
Andrea Sottosanti, Enrico Bovo, Pietro Belloni, Giovanna Boccuzzo
Disease mapping analyses the distribution of several disease outcomes within a territory. Primary goals include identifying areas with unexpected changes in mortality rates, studying the relation among multiple diseases, and dividing the analysed territory into clusters based on the observed levels of disease incidence or mortality. In this work, we focus on detecting spatial mortality clusters, that occur when neighbouring areas within a territory exhibit similar mortality levels due to one or more diseases. When multiple causes of death are examined together, it is relevant to identify not only the spatial boundaries of the clusters but also the diseases that lead to their formation. However, existing methods in literature struggle to address this dual problem effectively and simultaneously. To overcome these limitations, we introduce perla, a multivariate Bayesian model that clusters areas in a territory according to the observed mortality rates of multiple causes of death, also exploiting the information of external covariates. Our model incorporates the spatial structure of data directly into the clustering probabilities by leveraging the stick-breaking formulation of the multinomial distribution. Additionally, it exploits suitable global-local shrinkage priors to ensure that the detection of clusters depends on diseases showing concrete increases or decreases in mortality levels, while excluding uninformative diseases. We propose a Markov chain Monte Carlo algorithm for posterior inference that consists of closed-form Gibbs sampling moves for nearly every model parameter, without requiring complex tuning operations. This work is primarily motivated by a case study on the territory of a local unit within the Italian public healthcare system, known as ULSS6 Euganea. To demonstrate the flexibility and effectiveness of our methodology, we also validate perla with a series of simulation experiments and an extensive case study on mortality levels in U.S. counties.
{"title":"Bayesian mapping of mortality clusters.","authors":"Andrea Sottosanti, Enrico Bovo, Pietro Belloni, Giovanna Boccuzzo","doi":"10.1093/biostatistics/kxaf028","DOIUrl":"10.1093/biostatistics/kxaf028","url":null,"abstract":"<p><p>Disease mapping analyses the distribution of several disease outcomes within a territory. Primary goals include identifying areas with unexpected changes in mortality rates, studying the relation among multiple diseases, and dividing the analysed territory into clusters based on the observed levels of disease incidence or mortality. In this work, we focus on detecting spatial mortality clusters, that occur when neighbouring areas within a territory exhibit similar mortality levels due to one or more diseases. When multiple causes of death are examined together, it is relevant to identify not only the spatial boundaries of the clusters but also the diseases that lead to their formation. However, existing methods in literature struggle to address this dual problem effectively and simultaneously. To overcome these limitations, we introduce perla, a multivariate Bayesian model that clusters areas in a territory according to the observed mortality rates of multiple causes of death, also exploiting the information of external covariates. Our model incorporates the spatial structure of data directly into the clustering probabilities by leveraging the stick-breaking formulation of the multinomial distribution. Additionally, it exploits suitable global-local shrinkage priors to ensure that the detection of clusters depends on diseases showing concrete increases or decreases in mortality levels, while excluding uninformative diseases. We propose a Markov chain Monte Carlo algorithm for posterior inference that consists of closed-form Gibbs sampling moves for nearly every model parameter, without requiring complex tuning operations. This work is primarily motivated by a case study on the territory of a local unit within the Italian public healthcare system, known as ULSS6 Euganea. To demonstrate the flexibility and effectiveness of our methodology, we also validate perla with a series of simulation experiments and an extensive case study on mortality levels in U.S. counties.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12596199/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145477223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae043
Tingting Yu, Lang Wu, Ronald J Bosch, Davey M Smith, Rui Wang
Maximum likelihood inference can often become computationally intensive when performing joint modeling of longitudinal and time-to-event data, due to the intractable integrals in the joint likelihood function. The computational challenges escalate further when modeling HIV-1 viral load data, owing to the nonlinear trajectories and the presence of left-censored data resulting from the assay's lower limit of quantification. In this paper, for a joint model comprising a nonlinear mixed-effect model and a Cox Proportional Hazards model, we develop a computationally efficient Stochastic EM (StEM) algorithm for parameter estimation. Furthermore, we propose a novel technique for fast standard error estimation, which directly estimates standard errors from the results of StEM iterations and is broadly applicable to various joint modeling settings, such as those containing generalized linear mixed-effect models, parametric survival models, or joint models with more than two submodels. We evaluate the performance of the proposed methods through simulation studies and apply them to HIV-1 viral load data from six AIDS Clinical Trials Group studies to characterize viral rebound trajectories following the interruption of antiretroviral therapy (ART), accounting for the informative duration of off-ART periods.
{"title":"Fast standard error estimation for joint models of longitudinal and time-to-event data based on stochastic EM algorithms.","authors":"Tingting Yu, Lang Wu, Ronald J Bosch, Davey M Smith, Rui Wang","doi":"10.1093/biostatistics/kxae043","DOIUrl":"10.1093/biostatistics/kxae043","url":null,"abstract":"<p><p>Maximum likelihood inference can often become computationally intensive when performing joint modeling of longitudinal and time-to-event data, due to the intractable integrals in the joint likelihood function. The computational challenges escalate further when modeling HIV-1 viral load data, owing to the nonlinear trajectories and the presence of left-censored data resulting from the assay's lower limit of quantification. In this paper, for a joint model comprising a nonlinear mixed-effect model and a Cox Proportional Hazards model, we develop a computationally efficient Stochastic EM (StEM) algorithm for parameter estimation. Furthermore, we propose a novel technique for fast standard error estimation, which directly estimates standard errors from the results of StEM iterations and is broadly applicable to various joint modeling settings, such as those containing generalized linear mixed-effect models, parametric survival models, or joint models with more than two submodels. We evaluate the performance of the proposed methods through simulation studies and apply them to HIV-1 viral load data from six AIDS Clinical Trials Group studies to characterize viral rebound trajectories following the interruption of antiretroviral therapy (ART), accounting for the informative duration of off-ART periods.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823262/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142632694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mediation analysis is a useful tool in investigating how molecular phenotypes such as gene expression mediate the effect of exposure on health outcomes. However, commonly used mean-based total mediation effect measures may suffer from cancellation of component-wise mediation effects in opposite directions in the presence of high-dimensional omics mediators. To overcome this limitation, we recently proposed a variance-based R-squared total mediation effect measure that relies on the computationally intensive nonparametric bootstrap for confidence interval estimation. In the work described herein, we formulated a more efficient two-stage, cross-fitted estimation procedure for the R2 measure. To avoid potential bias, we performed iterative Sure Independence Screening (iSIS) in two subsamples to exclude the non-mediators, followed by ordinary least squares regressions for the variance estimation. We then constructed confidence intervals based on the newly derived closed-form asymptotic distribution of the R2 measure. Extensive simulation studies demonstrated that this proposed procedure is much more computationally efficient than the resampling-based method, with comparable coverage probability. Furthermore, when applied to the Framingham Heart Study, the proposed method replicated the established finding of gene expression mediating age-related variation in systolic blood pressure and identified the role of gene expression profiles in the relationship between sex and high-density lipoprotein cholesterol level. The proposed estimation procedure is implemented in R package CFR2M.
中介分析是研究基因表达等分子表型如何介导暴露对健康结果影响的有用工具。然而,常用的基于均值的总中介效应测量方法可能会在存在高维表观中介因子的情况下,出现分量-分量-分量的反向中介效应抵消的问题。为了克服这一局限性,我们最近提出了一种基于方差的 R 平方总中介效应测量方法,它依赖于计算密集型非参数自举法进行置信区间估计。在本文所述的工作中,我们为 R2 测量制定了更有效的两阶段交叉拟合估计程序。为了避免潜在的偏差,我们在两个子样本中进行了迭代确定独立性筛选(iSIS),以排除非调解人,然后用普通最小二乘法回归进行方差估计。然后,我们根据新推导出的 R2 测量的闭式渐近分布构建置信区间。广泛的模拟研究表明,与基于重采样的方法相比,我们提出的方法在计算上更有效率,而且覆盖概率相当。此外,当应用于弗雷明汉心脏研究时,所提出的方法复制了基因表达介导收缩压年龄相关变化的既定结论,并确定了基因表达谱在性别与高密度脂蛋白胆固醇水平之间关系中的作用。拟议的估计程序在 R 软件包 CFR2M 中实现。
{"title":"Speeding up interval estimation for R2-based mediation effect of high-dimensional mediators via cross-fitting.","authors":"Zhichao Xu, Chunlin Li, Sunyi Chi, Tianzhong Yang, Peng Wei","doi":"10.1093/biostatistics/kxae037","DOIUrl":"10.1093/biostatistics/kxae037","url":null,"abstract":"<p><p>Mediation analysis is a useful tool in investigating how molecular phenotypes such as gene expression mediate the effect of exposure on health outcomes. However, commonly used mean-based total mediation effect measures may suffer from cancellation of component-wise mediation effects in opposite directions in the presence of high-dimensional omics mediators. To overcome this limitation, we recently proposed a variance-based R-squared total mediation effect measure that relies on the computationally intensive nonparametric bootstrap for confidence interval estimation. In the work described herein, we formulated a more efficient two-stage, cross-fitted estimation procedure for the R2 measure. To avoid potential bias, we performed iterative Sure Independence Screening (iSIS) in two subsamples to exclude the non-mediators, followed by ordinary least squares regressions for the variance estimation. We then constructed confidence intervals based on the newly derived closed-form asymptotic distribution of the R2 measure. Extensive simulation studies demonstrated that this proposed procedure is much more computationally efficient than the resampling-based method, with comparable coverage probability. Furthermore, when applied to the Framingham Heart Study, the proposed method replicated the established finding of gene expression mediating age-related variation in systolic blood pressure and identified the role of gene expression profiles in the relationship between sex and high-density lipoprotein cholesterol level. The proposed estimation procedure is implemented in R package CFR2M.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11823199/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142481495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae042
Erin E Gabriel, Michael C Sachs, Arvid Sjölander
In instrumental variable (IV) settings, such as imperfect randomized trials and observational studies with Mendelian randomization, one may encounter a continuous exposure, the causal effect of which is not of true interest. Instead, scientific interest may lie in a coarsened version of this exposure. Although there is a lengthy literature on the impact of coarsening of an exposure with several works focusing specifically on IV settings, all methods proposed in this literature require parametric assumptions. Instead, just as in the standard IV setting, one can consider partial identification via bounds making no parametric assumptions. This was first pointed out in Alexander Balke's PhD dissertation. We extend and clarify his work and derive novel bounds in several settings, including for a three-level IV, which will most likely be the case in Mendelian randomization. We demonstrate our findings in two real data examples, a randomized trial for peanut allergy in infants and a Mendelian randomization setting investigating the effect of homocysteine on cardiovascular disease.
在工具变量(IV)环境中,如不完全随机试验和孟德尔随机化的观察研究中,我们可能会遇到一个连续的暴露因子,但其因果效应并不是我们真正感兴趣的。相反,科学兴趣可能在于这种暴露的粗略版本。尽管有大量文献研究了粗略化暴露的影响,其中有几部著作特别关注 IV 设置,但这些文献中提出的所有方法都需要参数假设。相反,就像在标准 IV 设置中一样,我们可以通过不带参数假设的约束来考虑部分识别。Alexander Balke 的博士论文首次指出了这一点。我们对他的工作进行了扩展和澄清,并在几种情况下推导出了新的边界,包括三层 IV,这很可能是孟德尔随机化的情况。我们在两个真实数据示例中展示了我们的发现,一个是针对婴儿花生过敏的随机试验,另一个是调查同型半胱氨酸对心血管疾病影响的孟德尔随机设置。
{"title":"The impact of coarsening an exposure on partial identifiability in instrumental variable settings.","authors":"Erin E Gabriel, Michael C Sachs, Arvid Sjölander","doi":"10.1093/biostatistics/kxae042","DOIUrl":"10.1093/biostatistics/kxae042","url":null,"abstract":"<p><p>In instrumental variable (IV) settings, such as imperfect randomized trials and observational studies with Mendelian randomization, one may encounter a continuous exposure, the causal effect of which is not of true interest. Instead, scientific interest may lie in a coarsened version of this exposure. Although there is a lengthy literature on the impact of coarsening of an exposure with several works focusing specifically on IV settings, all methods proposed in this literature require parametric assumptions. Instead, just as in the standard IV setting, one can consider partial identification via bounds making no parametric assumptions. This was first pointed out in Alexander Balke's PhD dissertation. We extend and clarify his work and derive novel bounds in several settings, including for a three-level IV, which will most likely be the case in Mendelian randomization. We demonstrate our findings in two real data examples, a randomized trial for peanut allergy in infants and a Mendelian randomization setting investigating the effect of homocysteine on cardiovascular disease.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142632696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxaf042
Kexin Qu, Christopher H Schmid, Tao Liu
An N-of-1 trial is a multiple crossover trial conducted in a single individual to provide evidence to directly inform personalized treatment decisions. Advances in wearable devices greatly improved the feasibility of adopting these trials to identify optimal individual treatment plans, particularly when treatments differ among individuals and responses are highly heterogeneous. Our work was motivated by the I-STOP-AFib Study, which examined the impact of different triggers on atrial fibrillation (AF) occurrence. We described a causal framework for "N-of-1" trial using potential treatment selection paths and potential outcome paths. Two estimands of individual causal effect were defined: (i) the effect of continuous exposure, and (ii) the effect of an individual's observed behavior. We addressed three challenges: (i) imperfect compliance to the randomized treatment assignment; (ii) binary treatments and binary outcomes, which led to the "non-collapsibility" issue of estimating odds ratios; and (iii) serial correlation in the longitudinal observations. We adopted the Bayesian IV approach where the study randomization was the instrumental variable (IV) as it impacted the patient's choice of exposure but not directly the outcome. Estimations were obtained through a system of two parametric Bayesian models to estimate the individual causal effect. Our model got around the non-collapsibility and non-consistency by modeling the confounding mechanism through latent structural models and by inferring with Bayesian posterior of functionals. Autocorrelation present in the repeated measurements was also accounted for. The simulation study showed our method largely reduced bias and greatly improved the coverage of the estimated causal effect, compared to existing methods (ITT, PP, and AT). We applied the method to I-STOP-AFib Study to estimate the individual effect of alcohol on AF occurrence.
{"title":"Instrumental variable approach to estimating individual causal effects in N-of-1 trials: application to ISTOP study.","authors":"Kexin Qu, Christopher H Schmid, Tao Liu","doi":"10.1093/biostatistics/kxaf042","DOIUrl":"https://doi.org/10.1093/biostatistics/kxaf042","url":null,"abstract":"<p><p>An N-of-1 trial is a multiple crossover trial conducted in a single individual to provide evidence to directly inform personalized treatment decisions. Advances in wearable devices greatly improved the feasibility of adopting these trials to identify optimal individual treatment plans, particularly when treatments differ among individuals and responses are highly heterogeneous. Our work was motivated by the I-STOP-AFib Study, which examined the impact of different triggers on atrial fibrillation (AF) occurrence. We described a causal framework for \"N-of-1\" trial using potential treatment selection paths and potential outcome paths. Two estimands of individual causal effect were defined: (i) the effect of continuous exposure, and (ii) the effect of an individual's observed behavior. We addressed three challenges: (i) imperfect compliance to the randomized treatment assignment; (ii) binary treatments and binary outcomes, which led to the \"non-collapsibility\" issue of estimating odds ratios; and (iii) serial correlation in the longitudinal observations. We adopted the Bayesian IV approach where the study randomization was the instrumental variable (IV) as it impacted the patient's choice of exposure but not directly the outcome. Estimations were obtained through a system of two parametric Bayesian models to estimate the individual causal effect. Our model got around the non-collapsibility and non-consistency by modeling the confounding mechanism through latent structural models and by inferring with Bayesian posterior of functionals. Autocorrelation present in the repeated measurements was also accounted for. The simulation study showed our method largely reduced bias and greatly improved the coverage of the estimated causal effect, compared to existing methods (ITT, PP, and AT). We applied the method to I-STOP-AFib Study to estimate the individual effect of alcohol on AF occurrence.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145745451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31DOI: 10.1093/biostatistics/kxae030
Serge Aleshin-Guendel, Jon Wakefield
The under-5 mortality rate (U5MR), a critical health indicator, is typically estimated from household surveys in lower and middle income countries. Spatio-temporal disaggregation of household survey data can lead to highly variable estimates of U5MR, necessitating the usage of smoothing models which borrow information across space and time. The assumptions of common smoothing models may be unrealistic when certain time periods or regions are expected to have shocks in mortality relative to their neighbors, which can lead to oversmoothing of U5MR estimates. In this paper, we develop a spatial and temporal smoothing approach based on Gaussian Markov random field models which incorporate knowledge of these expected shocks in mortality. We demonstrate the potential for these models to improve upon alternatives not incorporating knowledge of expected shocks in a simulation study. We apply these models to estimate U5MR in Rwanda at the national level from 1985 to 2019, a time period which includes the Rwandan civil war and genocide.
{"title":"Adaptive Gaussian Markov random fields for child mortality estimation.","authors":"Serge Aleshin-Guendel, Jon Wakefield","doi":"10.1093/biostatistics/kxae030","DOIUrl":"10.1093/biostatistics/kxae030","url":null,"abstract":"<p><p>The under-5 mortality rate (U5MR), a critical health indicator, is typically estimated from household surveys in lower and middle income countries. Spatio-temporal disaggregation of household survey data can lead to highly variable estimates of U5MR, necessitating the usage of smoothing models which borrow information across space and time. The assumptions of common smoothing models may be unrealistic when certain time periods or regions are expected to have shocks in mortality relative to their neighbors, which can lead to oversmoothing of U5MR estimates. In this paper, we develop a spatial and temporal smoothing approach based on Gaussian Markov random field models which incorporate knowledge of these expected shocks in mortality. We demonstrate the potential for these models to improve upon alternatives not incorporating knowledge of expected shocks in a simulation study. We apply these models to estimate U5MR in Rwanda at the national level from 1985 to 2019, a time period which includes the Rwandan civil war and genocide.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141894969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}