Pub Date : 2024-04-05eCollection Date: 2024-06-01DOI: 10.1214/23-AOAS1856
Ashish Patel, Francis J DiTraglia, Verena Zuber, Stephen Burgess
Mendelian randomization (MR) is a widely-used method to estimate the causal relationship between a risk factor and disease. A fundamental part of any MR analysis is to choose appropriate genetic variants as instrumental variables. Genome-wide association studies often reveal that hundreds of genetic variants may be robustly associated with a risk factor, but in some situations investigators may have greater confidence in the instrument validity of only a smaller subset of variants. Nevertheless, the use of additional instruments may be optimal from the perspective of mean squared error even if they are slightly invalid; a small bias in estimation may be a price worth paying for a larger reduction in variance. For this purpose, we consider a method for "focused" instrument selection whereby genetic variants are selected to minimise the estimated asymptotic mean squared error of causal effect estimates. In a setting of many weak and locally invalid instruments, we propose a novel strategy to construct confidence intervals for post-selection focused estimators that guards against the worst case loss in asymptotic coverage. In empirical applications to: (i) validate lipid drug targets; and (ii) investigate vitamin D effects on a wide range of outcomes, our findings suggest that the optimal selection of instruments does not involve only a small number of biologically-justified instruments, but also many potentially invalid instruments.
孟德尔随机法(Mendelian randomization,MR)是一种广泛使用的估算风险因素与疾病之间因果关系的方法。任何 MR 分析的一个基本部分都是选择适当的遗传变异作为工具变量。全基因组关联研究通常会发现,数以百计的遗传变异可能与某一风险因素密切相关,但在某些情况下,研究者可能只对较小变异子集的工具有效性更有信心。尽管如此,从均方误差的角度来看,使用额外的工具可能是最佳的,即使这些工具稍有无效;估计中的微小偏差可能是值得付出的代价,以换取方差的较大减少。为此,我们考虑了一种 "有针对性 "的工具选择方法,即通过选择遗传变异来最小化因果效应估计的渐近均方误差。在存在许多弱工具和局部无效工具的情况下,我们提出了一种新的策略来构建选择后集中估计器的置信区间,以防止渐近覆盖率的最坏情况损失。在(i) 验证脂质药物靶点;(ii) 调查维生素 D 对各种结果的影响的经验应用中,我们的研究结果表明,最佳工具选择不仅涉及少量生物学上合理的工具,还涉及许多潜在的无效工具。
{"title":"Selecting Invalid Instruments to Improve Mendelian Randomization with Two-Sample Summary Data.","authors":"Ashish Patel, Francis J DiTraglia, Verena Zuber, Stephen Burgess","doi":"10.1214/23-AOAS1856","DOIUrl":"10.1214/23-AOAS1856","url":null,"abstract":"<p><p>Mendelian randomization (MR) is a widely-used method to estimate the causal relationship between a risk factor and disease. A fundamental part of any MR analysis is to choose appropriate genetic variants as instrumental variables. Genome-wide association studies often reveal that hundreds of genetic variants may be robustly associated with a risk factor, but in some situations investigators may have greater confidence in the instrument validity of only a smaller subset of variants. Nevertheless, the use of additional instruments may be optimal from the perspective of mean squared error even if they are slightly invalid; a small bias in estimation may be a price worth paying for a larger reduction in variance. For this purpose, we consider a method for \"focused\" instrument selection whereby genetic variants are selected to minimise the estimated asymptotic mean squared error of causal effect estimates. In a setting of many weak and locally invalid instruments, we propose a novel strategy to construct confidence intervals for post-selection focused estimators that guards against the worst case loss in asymptotic coverage. In empirical applications to: (i) validate lipid drug targets; and (ii) investigate vitamin D effects on a wide range of outcomes, our findings suggest that the optimal selection of instruments does not involve only a small number of biologically-justified instruments, but also many potentially invalid instruments.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 2","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7615940/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140913351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-01Epub Date: 2024-01-31DOI: 10.1214/23-aoas1792
Xinyuan Chen, Michael O Harhay, Guangyu Tong, Fan Li
Assessing heterogeneity in the effects of treatments has become increasingly popular in the field of causal inference and carries important implications for clinical decision-making. While extensive literature exists for studying treatment effect heterogeneity when outcomes are fully observed, there has been limited development in tools for estimating heterogeneous causal effects when patient-centered outcomes are truncated by a terminal event, such as death. Due to mortality occurring during study follow-up, the outcomes of interest are unobservable, undefined, or not fully observed for many participants in which case principal stratification is an appealing framework to draw valid causal conclusions. Motivated by the Acute Respiratory Distress Syndrome Network (ARDSNetwork) ARDS respiratory management (ARMA) trial, we developed a flexible Bayesian machine learning approach to estimate the average causal effect and heterogeneous causal effects among the always-survivors stratum when clinical outcomes are subject to truncation. We adopted Bayesian additive regression trees (BART) to flexibly specify separate mean models for the potential outcomes and latent stratum membership. In the analysis of the ARMA trial, we found that the low tidal volume treatment had an overall benefit for participants sustaining acute lung injuries on the outcome of time to returning home but substantial heterogeneity in treatment effects among the always-survivors, driven most strongly by biologic sex and the alveolar-arterial oxygen gradient at baseline (a physiologic measure of lung function and degree of hypoxemia). These findings illustrate how the proposed methodology could guide the prognostic enrichment of future trials in the field.
评估治疗效果的异质性在因果推理领域越来越流行,对临床决策具有重要意义。虽然已有大量文献用于研究完全观察结果时治疗效果的异质性,但当以患者为中心的结果被死亡等终末事件截断时,用于估计异质性因果效应的工具的发展还很有限。由于死亡发生在研究随访期间,许多参与者的相关结果是不可观察、未定义或未完全观察到的,在这种情况下,主分层是得出有效因果结论的一个有吸引力的框架。受急性呼吸窘迫综合征网络(ARDSNetwork)ARDS 呼吸管理(ARMA)试验的启发,我们开发了一种灵活的贝叶斯机器学习方法,用于估算当临床结果受到截断影响时,始终存活者层中的平均因果效应和异质性因果效应。我们采用贝叶斯加性回归树(BART),灵活地为潜在结果和潜在分层成员指定单独的均值模型。在对 ARMA 试验的分析中,我们发现低潮气量治疗对急性肺损伤参与者的总体获益在于重返家园的时间,但在始终存活者中,治疗效果存在很大的异质性,这主要受生物性别和基线肺泡-动脉血氧梯度(肺功能和低氧血症程度的生理学测量指标)的影响。这些发现说明了所提出的方法可如何指导该领域未来试验的预后丰富化。
{"title":"A BAYESIAN MACHINE LEARNING APPROACH FOR ESTIMATING HETEROGENEOUS SURVIVOR CAUSAL EFFECTS: APPLICATIONS TO A CRITICAL CARE TRIAL.","authors":"Xinyuan Chen, Michael O Harhay, Guangyu Tong, Fan Li","doi":"10.1214/23-aoas1792","DOIUrl":"10.1214/23-aoas1792","url":null,"abstract":"<p><p>Assessing heterogeneity in the effects of treatments has become increasingly popular in the field of causal inference and carries important implications for clinical decision-making. While extensive literature exists for studying treatment effect heterogeneity when outcomes are fully observed, there has been limited development in tools for estimating heterogeneous causal effects when patient-centered outcomes are truncated by a terminal event, such as death. Due to mortality occurring during study follow-up, the outcomes of interest are unobservable, undefined, or not fully observed for many participants in which case principal stratification is an appealing framework to draw valid causal conclusions. Motivated by the Acute Respiratory Distress Syndrome Network (ARDSNetwork) ARDS respiratory management (ARMA) trial, we developed a flexible Bayesian machine learning approach to estimate the average causal effect and heterogeneous causal effects among the always-survivors stratum when clinical outcomes are subject to truncation. We adopted Bayesian additive regression trees (BART) to flexibly specify separate mean models for the potential outcomes and latent stratum membership. In the analysis of the ARMA trial, we found that the low tidal volume treatment had an overall benefit for participants sustaining acute lung injuries on the outcome of time to returning home but substantial heterogeneity in treatment effects among the always-survivors, driven most strongly by biologic sex and the alveolar-arterial oxygen gradient at baseline (a physiologic measure of lung function and degree of hypoxemia). These findings illustrate how the proposed methodology could guide the prognostic enrichment of future trials in the field.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 1","pages":"350-374"},"PeriodicalIF":1.8,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10919396/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140061169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-01Epub Date: 2024-01-31DOI: 10.1214/23-aoas1817
Alan J Aw, Jeffrey P Spence, Yun S Song
In scientific studies involving analyses of multivariate data, basic but important questions often arise for the researcher: Is the sample exchangeable, meaning that the joint distribution of the sample is invariant to the ordering of the units? Are the features independent of one another, or perhaps the features can be grouped so that the groups are mutually independent? In statistical genomics, these considerations are fundamental to downstream tasks such as demographic inference and the construction of polygenic risk scores. We propose a non-parametric approach, which we call the V test, to address these two questions, namely, a test of sample exchangeability given dependency structure of features, and a test of feature independence given sample exchangeability. Our test is conceptually simple, yet fast and flexible. It controls the Type I error across realistic scenarios, and handles data of arbitrary dimensions by leveraging large-sample asymptotics. Through extensive simulations and a comparison against unsupervised tests of stratification based on random matrix theory, we find that our test compares favorably in various scenarios of interest. We apply the test to data from the 1000 Genomes Project, demonstrating how it can be employed to assess exchangeability of the genetic sample, or find optimal linkage disequilibrium (LD) splits for downstream analysis. For exchangeability assessment, we find that removing rare variants can substantially increase the -value of the test statistic. For optimal LD splitting, the V test reports different optimal splits than previous approaches not relying on hypothesis testing. Software for our methods is available in R (CRAN: flintyR) and Python (PyPI: flintyPy).
在涉及多变量数据分析的科学研究中,研究人员经常会遇到一些基本但重要的问题: 样本是否可交换,即样本的联合分布与单位排序无关?特征是否相互独立,或者特征是否可以分组,从而使各组相互独立?在统计基因组学中,这些考虑因素对于人口推断和构建多基因风险评分等下游任务至关重要。我们提出了一种非参数方法(我们称之为 V 检验)来解决这两个问题,即给定特征依赖结构的样本可交换性检验和给定样本可交换性的特征独立性检验。我们的检验方法概念简单、快速灵活。它能在现实场景中控制 I 类误差,并利用大样本渐近学处理任意维度的数据。通过大量的模拟以及与基于随机矩阵理论的无监督分层检验的比较,我们发现我们的检验在各种感兴趣的情况下都表现出色。我们将该检验应用于 1000 基因组计划的数据,展示了如何利用它来评估基因样本的可交换性,或为下游分析找到最佳的连锁不平衡(LD)分割。在可交换性评估中,我们发现去除罕见变异可大幅提高检验统计量的 p 值。对于最优 LD 分割,V 检验报告的最优分割与之前不依赖假设检验的方法不同。我们的方法可在 R(CRAN:flintyR)和 Python(PyPI:flintyPy)中使用。
{"title":"A SIMPLE AND FLEXIBLE TEST OF SAMPLE EXCHANGEABILITY WITH APPLICATIONS TO STATISTICAL GENOMICS.","authors":"Alan J Aw, Jeffrey P Spence, Yun S Song","doi":"10.1214/23-aoas1817","DOIUrl":"10.1214/23-aoas1817","url":null,"abstract":"<p><p>In scientific studies involving analyses of multivariate data, basic but important questions often arise for the researcher: Is the sample exchangeable, meaning that the joint distribution of the sample is invariant to the ordering of the units? Are the features independent of one another, or perhaps the features can be grouped so that the groups are mutually independent? In statistical genomics, these considerations are fundamental to downstream tasks such as demographic inference and the construction of polygenic risk scores. We propose a non-parametric approach, which we call the V test, to address these two questions, namely, a test of sample exchangeability given dependency structure of features, and a test of feature independence given sample exchangeability. Our test is conceptually simple, yet fast and flexible. It controls the Type I error across realistic scenarios, and handles data of arbitrary dimensions by leveraging large-sample asymptotics. Through extensive simulations and a comparison against unsupervised tests of stratification based on random matrix theory, we find that our test compares favorably in various scenarios of interest. We apply the test to data from the 1000 Genomes Project, demonstrating how it can be employed to assess exchangeability of the genetic sample, or find optimal linkage disequilibrium (LD) splits for downstream analysis. For exchangeability assessment, we find that removing rare variants can substantially increase the <math><mi>p</mi></math>-value of the test statistic. For optimal LD splitting, the V test reports different optimal splits than previous approaches not relying on hypothesis testing. Software for our methods is available in R (CRAN: flintyR) and Python (PyPI: flintyPy).</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 1","pages":"858-881"},"PeriodicalIF":1.8,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11115382/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141089297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-01Epub Date: 2024-01-31DOI: 10.1214/23-aoas1782
Yiwen Zhang, Ran Dai, Ying Huang, Ross Prentice, Cheng Zheng
Systematic measurement error in self-reported data creates important challenges in association studies between dietary intakes and chronic disease risks, especially when multiple dietary components are studied jointly. The joint regression calibration method has been developed for measurement error correction when objectively measured biomarkers are available for all dietary components of interest. Unfortunately, objectively measured biomarkers are only available for very few dietary components, which limits the application of the joint regression calibration method. Recently, for single dietary components, controlled feeding studies have been performed to develop new biomarkers for many more dietary components. However, it is unclear whether the biomarkers separately developed for single dietary components are valid for joint calibration. In this paper, we show that biomarkers developed for single dietary components cannot be used for joint regression calibration. We propose new methods to utilize controlled feeding studies to develop valid biomarkers for joint regression calibration to estimate the association between multiple dietary components simultaneously with the disease of interest. Asymptotic distribution theory for the proposed estimators is derived. Extensive simulations are performed to study the finite sample performance of the proposed estimators. We apply our methods to examine the joint effects of sodium and potassium intakes on cardiovascular disease incidence using the Women's Health Initiative cohort data. We identify positive associations between sodium intake and cardiovascular diseases as well as negative associations between potassium intake and cardiovascular disease.
{"title":"USING SIMULTANEOUS REGRESSION CALIBRATION TO STUDY THE EFFECT OF MULTIPLE ERROR-PRONE EXPOSURES ON DISEASE RISK UTILIZING BIOMARKERS DEVELOPED FROM A CONTROLLED FEEDING STUDY.","authors":"Yiwen Zhang, Ran Dai, Ying Huang, Ross Prentice, Cheng Zheng","doi":"10.1214/23-aoas1782","DOIUrl":"10.1214/23-aoas1782","url":null,"abstract":"<p><p>Systematic measurement error in self-reported data creates important challenges in association studies between dietary intakes and chronic disease risks, especially when multiple dietary components are studied jointly. The joint regression calibration method has been developed for measurement error correction when objectively measured biomarkers are available for all dietary components of interest. Unfortunately, objectively measured biomarkers are only available for very few dietary components, which limits the application of the joint regression calibration method. Recently, for single dietary components, controlled feeding studies have been performed to develop new biomarkers for many more dietary components. However, it is unclear whether the biomarkers separately developed for single dietary components are valid for joint calibration. In this paper, we show that biomarkers developed for single dietary components cannot be used for joint regression calibration. We propose new methods to utilize controlled feeding studies to develop valid biomarkers for joint regression calibration to estimate the association between multiple dietary components simultaneously with the disease of interest. Asymptotic distribution theory for the proposed estimators is derived. Extensive simulations are performed to study the finite sample performance of the proposed estimators. We apply our methods to examine the joint effects of sodium and potassium intakes on cardiovascular disease incidence using the Women's Health Initiative cohort data. We identify positive associations between sodium intake and cardiovascular diseases as well as negative associations between potassium intake and cardiovascular disease.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 1","pages":"125-143"},"PeriodicalIF":1.3,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10836829/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139681864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-01Epub Date: 2024-01-31DOI: 10.1214/23-aoas1797
Zikai Lin, Yajuan Si, Jian Kang
Image-on-scalar regression has been a popular approach to modeling the association between brain activities and scalar characteristics in neuroimaging research. The associations could be heterogeneous across individuals in the population, as indicated by recent large-scale neuroimaging studies, for example, the Adolescent Brain Cognitive Development (ABCD) Study. The ABCD data can inform our understanding of heterogeneous associations and how to leverage the heterogeneity and tailor interventions to increase the number of youths who benefit. It is of great interest to identify subgroups of individuals from the population such that: (1) within each subgroup the brain activities have homogeneous associations with the clinical measures; (2) across subgroups the associations are heterogeneous, and (3) the group allocation depends on individual characteristics. Existing image-on-scalar regression methods and clustering methods cannot directly achieve this goal. We propose a latent subgroup image-on-scalar regression model (LASIR) to analyze large-scale, multisite neuroimaging data with diverse sociode-mographics. LASIR introduces the latent subgroup for each individual and group-specific, spatially varying effects, with an efficient stochastic expectation maximization algorithm for inferences. We demonstrate that LASIR outperforms existing alternatives for subgroup identification of brain activation patterns with functional magnetic resonance imaging data via comprehensive simulations and applications to the ABCD study. We have released our reproducible codes for public use with the software package available on Github.
{"title":"LATENT SUBGROUP IDENTIFICATION IN IMAGE-ON-SCALAR REGRESSION.","authors":"Zikai Lin, Yajuan Si, Jian Kang","doi":"10.1214/23-aoas1797","DOIUrl":"10.1214/23-aoas1797","url":null,"abstract":"<p><p>Image-on-scalar regression has been a popular approach to modeling the association between brain activities and scalar characteristics in neuroimaging research. The associations could be heterogeneous across individuals in the population, as indicated by recent large-scale neuroimaging studies, for example, the Adolescent Brain Cognitive Development (ABCD) Study. The ABCD data can inform our understanding of heterogeneous associations and how to leverage the heterogeneity and tailor interventions to increase the number of youths who benefit. It is of great interest to identify subgroups of individuals from the population such that: (1) within each subgroup the brain activities have homogeneous associations with the clinical measures; (2) across subgroups the associations are heterogeneous, and (3) the group allocation depends on individual characteristics. Existing image-on-scalar regression methods and clustering methods cannot directly achieve this goal. We propose a latent subgroup image-on-scalar regression model (LASIR) to analyze large-scale, multisite neuroimaging data with diverse sociode-mographics. LASIR introduces the latent subgroup for each individual and group-specific, spatially varying effects, with an efficient stochastic expectation maximization algorithm for inferences. We demonstrate that LASIR outperforms existing alternatives for subgroup identification of brain activation patterns with functional magnetic resonance imaging data via comprehensive simulations and applications to the ABCD study. We have released our reproducible codes for public use with the software package available on Github.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 1","pages":"468-486"},"PeriodicalIF":1.8,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11156244/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141285216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-01Epub Date: 2024-01-31DOI: 10.1214/23-aoas1791
Zeda Li, Yu Ryan Yue, Scott A Bruce
We propose a novel analysis of power (ANOPOW) model for analyzing replicated nonstationary time series commonly encountered in experimental studies. Based on a locally stationary ANOPOW Cramér spectral representation, the proposed model can be used to compare the second-order time-varying frequency patterns among different groups of time series and to estimate group effects as functions of both time and frequency. Formulated in a Bayesian framework, independent two-dimensional second-order random walk (RW2D) priors are assumed on each of the time-varying functional effects for flexible and adaptive smoothing. A piecewise stationary approximation of the nonstationary time series is used to obtain localized estimates of time-varying spectra. Posterior distributions of the time-varying functional group effects are then obtained via integrated nested Laplace approximations (INLA) at a low computational cost. The large-sample distribution of local periodograms can be appropriately utilized to improve estimation accuracy since INLA allows modeling of data with various types of distributions. The usefulness of the proposed model is illustrated through two real data applications: analyses of seismic signals and pupil diameter time series in children with attention deficit hyperactivity disorder. Simulation studies, Supplementary Materials (Li, Yue and Bruce, 2023a), and R code (Li, Yue and Bruce, 2023b) for this article are also available.
我们提出了一种新颖的功率分析(ANOPOW)模型,用于分析实验研究中常见的重复非平稳时间序列。基于局部静止的 ANOPOW Cramér 频谱表示,所提出的模型可用于比较不同时间序列组间的二阶时变频率模式,并估算作为时间和频率函数的组效应。在贝叶斯框架下,假设每个时变函数效应都有独立的二维二阶随机游走(RW2D)先验,以实现灵活的自适应平滑。非平稳时间序列的片断平稳近似用于获得时变频谱的局部估计值。然后,通过集成嵌套拉普拉斯近似(INLA),以较低的计算成本获得时变功能组效应的后验分布。由于 INLA 可以对各种类型分布的数据建模,因此可以适当利用局部周期图的大样本分布来提高估计精度。本文通过两个实际数据应用说明了所提模型的实用性:地震信号分析和注意力缺陷多动障碍儿童的瞳孔直径时间序列分析。本文的仿真研究、补充材料(Li, Yue and Bruce, 2023a)和 R 代码(Li, Yue and Bruce, 2023b)也已发布。
{"title":"ANOPOW FOR REPLICATED NONSTATIONARY TIME SERIES IN EXPERIMENTS.","authors":"Zeda Li, Yu Ryan Yue, Scott A Bruce","doi":"10.1214/23-aoas1791","DOIUrl":"10.1214/23-aoas1791","url":null,"abstract":"<p><p>We propose a novel analysis of power (ANOPOW) model for analyzing replicated nonstationary time series commonly encountered in experimental studies. Based on a locally stationary ANOPOW Cramér spectral representation, the proposed model can be used to compare the second-order time-varying frequency patterns among different groups of time series and to estimate group effects as functions of both time and frequency. Formulated in a Bayesian framework, independent two-dimensional second-order random walk (RW2D) priors are assumed on each of the time-varying functional effects for flexible and adaptive smoothing. A piecewise stationary approximation of the nonstationary time series is used to obtain localized estimates of time-varying spectra. Posterior distributions of the time-varying functional group effects are then obtained via integrated nested Laplace approximations (INLA) at a low computational cost. The large-sample distribution of local periodograms can be appropriately utilized to improve estimation accuracy since INLA allows modeling of data with various types of distributions. The usefulness of the proposed model is illustrated through two real data applications: analyses of seismic signals and pupil diameter time series in children with attention deficit hyperactivity disorder. Simulation studies, Supplementary Materials (Li, Yue and Bruce, 2023a), and R code (Li, Yue and Bruce, 2023b) for this article are also available.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 1","pages":"328-349"},"PeriodicalIF":1.8,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10906746/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140023131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-01Epub Date: 2024-01-31DOI: 10.1214/23-aoas1813
J Brandon Carter, Christopher R Browning, Bethany Boettner, Nicolo Pinchak, Catherine A Calder
Collective efficacy-the capacity of communities to exert social control toward the realization of their shared goals-is a foundational concept in the urban sociology and neighborhood effects literature. Traditionally, empirical studies of collective efficacy use large sample surveys to estimate collective efficacy of different neighborhoods within an urban setting. Such studies have demonstrated an association between collective efficacy and local variation in community violence, educational achievement, and health. Unlike traditional collective efficacy measurement strategies, the Adolescent Health and Development in Context (AHDC) Study implemented a new approach, obtaining spatially-referenced, place-based ratings of collective efficacy from a representative sample of individuals residing in Columbus, OH. In this paper we introduce a novel nonstationary spatial model for interpolation of the AHDC collective efficacy ratings across the study area, which leverages administrative data on land use. Our constructive model specification strategy involves dimension expansion of a latent spatial process and the use of a filter defined by the land-use partition of the study region to connect the latent multivariate spatial process to the observed ordinal ratings of collective efficacy. Careful consideration is given to the issues of parameter identifiability, computational efficiency of an MCMC algorithm for model fitting, and fine-scale spatial prediction of collective efficacy.
{"title":"LAND-USE FILTERING FOR NONSTATIONARY SPATIAL PREDICTION OF COLLECTIVE EFFICACY IN AN URBAN ENVIRONMENT.","authors":"J Brandon Carter, Christopher R Browning, Bethany Boettner, Nicolo Pinchak, Catherine A Calder","doi":"10.1214/23-aoas1813","DOIUrl":"10.1214/23-aoas1813","url":null,"abstract":"<p><p>Collective efficacy-the capacity of communities to exert social control toward the realization of their shared goals-is a foundational concept in the urban sociology and neighborhood effects literature. Traditionally, empirical studies of collective efficacy use large sample surveys to estimate collective efficacy of different neighborhoods within an urban setting. Such studies have demonstrated an association between collective efficacy and local variation in community violence, educational achievement, and health. Unlike traditional collective efficacy measurement strategies, the Adolescent Health and Development in Context (AHDC) Study implemented a new approach, obtaining spatially-referenced, place-based ratings of collective efficacy from a representative sample of individuals residing in Columbus, OH. In this paper we introduce a novel nonstationary spatial model for interpolation of the AHDC collective efficacy ratings across the study area, which leverages administrative data on land use. Our constructive model specification strategy involves dimension expansion of a latent spatial process and the use of a filter defined by the land-use partition of the study region to connect the latent multivariate spatial process to the observed ordinal ratings of collective efficacy. Careful consideration is given to the issues of parameter identifiability, computational efficiency of an MCMC algorithm for model fitting, and fine-scale spatial prediction of collective efficacy.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 1","pages":"794-818"},"PeriodicalIF":1.8,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11146085/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141238803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-01Epub Date: 2024-01-31DOI: 10.1214/23-aoas1798
Gang Xu, Amei Amei, Weimiao Wu, Yunqing Liu, Linchuan Shen, Edwin C Oh, Zuoheng Wang
Many genetic studies contain rich information on longitudinal phenotypes that require powerful analytical tools for optimal analysis. Genetic analysis of longitudinal data that incorporates temporal variation is important for understanding the genetic architecture and biological variation of complex diseases. Most of the existing methods assume that the contribution of genetic variants is constant over time and fail to capture the dynamic pattern of disease progression. However, the relative influence of genetic variants on complex traits fluctuates over time. In this study, we propose a retrospective varying coefficient mixed model association test, RVMMAT, to detect time-varying genetic effect on longitudinal binary traits. We model dynamic genetic effect using smoothing splines, estimate model parameters by maximizing a double penalized quasi-likelihood function, design a joint test using a Cauchy combination method, and evaluate statistical significance via a retrospective approach to achieve robustness to model misspecification. Through simulations we illustrated that the retrospective varying-coefficient test was robust to model misspecification under different ascertainment schemes and gained power over the association methods assuming constant genetic effect. We applied RVMMAT to a genome-wide association analysis of longitudinal measure of hypertension in the Multi-Ethnic Study of Atherosclerosis. Pathway analysis identified two important pathways related to G-protein signaling and DNA damage. Our results demonstrated that RVMMAT could detect biologically relevant loci and pathways in a genome scan and provided insight into the genetic architecture of hypertension.
许多遗传研究都包含丰富的纵向表型信息,需要强大的分析工具来进行优化分析。对包含时间变异的纵向数据进行遗传分析,对于了解复杂疾病的遗传结构和生物变异非常重要。现有的大多数方法都假定遗传变异的贡献随时间变化是恒定的,因此无法捕捉疾病进展的动态模式。然而,遗传变异对复杂性状的相对影响是随时间波动的。在本研究中,我们提出了一种回顾性变化系数混合模型关联检验--RVMMAT,以检测对纵向二元性状的时变遗传效应。我们使用平滑样条建立动态遗传效应模型,通过最大化双惩罚准似然比函数估计模型参数,使用考奇组合方法设计联合检验,并通过追溯方法评估统计显著性,以实现对模型错误规范的稳健性。通过模拟实验,我们证明了在不同的确定方案下,追溯性变化系数检验对模型错误规范具有稳健性,并且比假设恒定遗传效应的关联方法更有说服力。我们将 RVMMAT 应用于动脉粥样硬化多种族研究中高血压纵向测量的全基因组关联分析。通路分析确定了与 G 蛋白信号传导和 DNA 损伤相关的两条重要通路。我们的研究结果表明,RVMMAT 可以在基因组扫描中检测到与生物相关的位点和通路,并提供了对高血压遗传结构的深入了解。
{"title":"RETROSPECTIVE VARYING COEFFICIENT ASSOCIATION ANALYSIS OF LONGITUDINAL BINARY TRAITS: APPLICATION TO THE IDENTIFICATION OF GENETIC LOCI ASSOCIATED WITH HYPERTENSION.","authors":"Gang Xu, Amei Amei, Weimiao Wu, Yunqing Liu, Linchuan Shen, Edwin C Oh, Zuoheng Wang","doi":"10.1214/23-aoas1798","DOIUrl":"10.1214/23-aoas1798","url":null,"abstract":"<p><p>Many genetic studies contain rich information on longitudinal phenotypes that require powerful analytical tools for optimal analysis. Genetic analysis of longitudinal data that incorporates temporal variation is important for understanding the genetic architecture and biological variation of complex diseases. Most of the existing methods assume that the contribution of genetic variants is constant over time and fail to capture the dynamic pattern of disease progression. However, the relative influence of genetic variants on complex traits fluctuates over time. In this study, we propose a retrospective varying coefficient mixed model association test, RVMMAT, to detect time-varying genetic effect on longitudinal binary traits. We model dynamic genetic effect using smoothing splines, estimate model parameters by maximizing a double penalized quasi-likelihood function, design a joint test using a Cauchy combination method, and evaluate statistical significance via a retrospective approach to achieve robustness to model misspecification. Through simulations we illustrated that the retrospective varying-coefficient test was robust to model misspecification under different ascertainment schemes and gained power over the association methods assuming constant genetic effect. We applied RVMMAT to a genome-wide association analysis of longitudinal measure of hypertension in the Multi-Ethnic Study of Atherosclerosis. Pathway analysis identified two important pathways related to G-protein signaling and DNA damage. Our results demonstrated that RVMMAT could detect biologically relevant loci and pathways in a genome scan and provided insight into the genetic architecture of hypertension.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 1","pages":"487-505"},"PeriodicalIF":1.3,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10994004/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140868741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-01Epub Date: 2024-01-31DOI: 10.1214/23-aoas1809
Nicholas Hartman, Joseph M Messana, Jian Kang, Abhijit S Naik, Tempie H Shearon, Kevin He
Risk-adjusted quality measures are used to evaluate healthcare providers with respect to national norms while controlling for factors beyond their control. Existing healthcare provider profiling approaches typically assume that the between-provider variation in these measures is entirely due to meaningful differences in quality of care. However, in practice, much of the between-provider variation will be due to trivial fluctuations in healthcare quality, or unobservable confounding risk factors. If these additional sources of variation are not accounted for, conventional methods will disproportionately identify larger providers as outliers, even though their departures from the national norms may not be "extreme" or clinically meaningful. Motivated by efforts to evaluate the quality of care provided by transplant centers, we develop a composite evaluation score based on a novel individualized empirical null method, which robustly accounts for overdispersion due to unobserved risk factors, models the marginal variance of standardized scores as a function of the effective sample size, and only requires the use of publicly-available center-level statistics. The evaluations of United States kidney transplant centers based on the proposed composite score are substantially different from those based on conventional methods. Simulations show that the proposed empirical null approach more accurately classifies centers in terms of quality of care, compared to existing methods.
{"title":"COMPOSITE SCORES FOR TRANSPLANT CENTER EVALUATION: A NEW INDIVIDUALIZED EMPIRICAL NULL METHOD.","authors":"Nicholas Hartman, Joseph M Messana, Jian Kang, Abhijit S Naik, Tempie H Shearon, Kevin He","doi":"10.1214/23-aoas1809","DOIUrl":"10.1214/23-aoas1809","url":null,"abstract":"<p><p>Risk-adjusted quality measures are used to evaluate healthcare providers with respect to national norms while controlling for factors beyond their control. Existing healthcare provider profiling approaches typically assume that the between-provider variation in these measures is entirely due to meaningful differences in quality of care. However, in practice, much of the between-provider variation will be due to trivial fluctuations in healthcare quality, or unobservable confounding risk factors. If these additional sources of variation are not accounted for, conventional methods will disproportionately identify larger providers as outliers, even though their departures from the national norms may not be \"extreme\" or clinically meaningful. Motivated by efforts to evaluate the quality of care provided by transplant centers, we develop a composite evaluation score based on a novel individualized empirical null method, which robustly accounts for overdispersion due to unobserved risk factors, models the marginal variance of standardized scores as a function of the effective sample size, and only requires the use of publicly-available center-level statistics. The evaluations of United States kidney transplant centers based on the proposed composite score are substantially different from those based on conventional methods. Simulations show that the proposed empirical null approach more accurately classifies centers in terms of quality of care, compared to existing methods.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 1","pages":"729-748"},"PeriodicalIF":1.3,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11395314/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142300086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-01Epub Date: 2023-10-30DOI: 10.1214/23-aoas1758
Jingjing Zou, Tuo Lin, Chongzhi Di, John Bellettiere, Marta M Jankowska, Sheri J Hartman, Dorothy D Sears, Andrea Z LaCroix, Cheryl L Rock, Loki Natarajan
Physical activity (PA) is significantly associated with many health outcomes. The wide usage of wearable accelerometer-based activity trackers in recent years has provided a unique opportunity for in-depth research on PA and its relations with health outcomes and interventions. Past analysis of activity tracker data relies heavily on aggregating minute-level PA records into day-level summary statistics in which important information of PA temporal/diurnal patterns is lost. In this paper we propose a novel functional data analysis approach based on Riemann manifolds for modeling PA and its longitudinal changes. We model smoothed minute-level PA of a day as one-dimensional Riemann manifolds and longitudinal changes in PA in different visits as deformations between manifolds. The variability in changes of PA among a cohort of subjects is characterized via variability in the deformation. Functional principal component analysis is further adopted to model the deformations, and PC scores are used as a proxy in modeling the relation between changes in PA and health outcomes and/or interventions. We conduct comprehensive analyses on data from two clinical trials: Reach for Health (RfH) and Metabolism, Exercise and Nutrition at UCSD (MENU), focusing on the effect of interventions on longitudinal changes in PA patterns and how different modes of changes in PA influence weight loss, respectively. The proposed approach reveals unique modes of changes, including overall enhanced PA, boosted morning PA, and shifts of active hours specific to each study cohort. The results bring new insights into the study of longitudinal changes in PA and health and have the potential to facilitate designing of effective health interventions and guidelines.
体力活动(PA)与许多健康结果密切相关。近年来,基于加速度计的可穿戴活动追踪器的广泛使用为深入研究体力活动及其与健康结果和干预措施的关系提供了一个独特的机会。以往对活动追踪器数据的分析主要依赖于将分钟级的活动量记录汇总成天级的汇总统计数据,这就失去了活动量时间/昼夜模式的重要信息。在本文中,我们提出了一种基于黎曼流形的新型功能数据分析方法,用于模拟 PA 及其纵向变化。我们将一天中平滑的分钟级 PA 建模为一维黎曼流形,并将不同访问中 PA 的纵向变化建模为流形之间的变形。一组受试者之间 PA 变化的变异性通过变形的变异性来表征。我们进一步采用功能主成分分析法对变形进行建模,并将 PC 分数作为代理变量,对 PA 变化与健康结果和/或干预措施之间的关系进行建模。我们对两项临床试验的数据进行了综合分析:我们对两项临床试验的数据进行了综合分析:Reach for Health (RfH) 和 Metabolism, Exercise and Nutrition at UCSD (MENU),分别侧重于干预措施对 PA 模式纵向变化的影响,以及 PA 的不同变化模式如何影响体重减轻。所提出的方法揭示了独特的变化模式,包括整体增强的活动量、增强的晨间活动量以及每个研究队列特有的活动时间变化。这些结果为研究运动量和健康的纵向变化带来了新的见解,并有可能促进设计有效的健康干预措施和指南。
{"title":"A RIEMANN MANIFOLD MODEL FRAMEWORK FOR LONGITUDINAL CHANGES IN PHYSICAL ACTIVITY PATTERNS.","authors":"Jingjing Zou, Tuo Lin, Chongzhi Di, John Bellettiere, Marta M Jankowska, Sheri J Hartman, Dorothy D Sears, Andrea Z LaCroix, Cheryl L Rock, Loki Natarajan","doi":"10.1214/23-aoas1758","DOIUrl":"10.1214/23-aoas1758","url":null,"abstract":"<p><p>Physical activity (PA) is significantly associated with many health outcomes. The wide usage of wearable accelerometer-based activity trackers in recent years has provided a unique opportunity for in-depth research on PA and its relations with health outcomes and interventions. Past analysis of activity tracker data relies heavily on aggregating minute-level PA records into day-level summary statistics in which important information of PA temporal/diurnal patterns is lost. In this paper we propose a novel functional data analysis approach based on Riemann manifolds for modeling PA and its longitudinal changes. We model smoothed minute-level PA of a day as one-dimensional Riemann manifolds and longitudinal changes in PA in different visits as deformations between manifolds. The variability in changes of PA among a cohort of subjects is characterized via variability in the deformation. Functional principal component analysis is further adopted to model the deformations, and PC scores are used as a proxy in modeling the relation between changes in PA and health outcomes and/or interventions. We conduct comprehensive analyses on data from two clinical trials: Reach for Health (RfH) and Metabolism, Exercise and Nutrition at UCSD (MENU), focusing on the effect of interventions on longitudinal changes in PA patterns and how different modes of changes in PA influence weight loss, respectively. The proposed approach reveals unique modes of changes, including overall enhanced PA, boosted morning PA, and shifts of active hours specific to each study cohort. The results bring new insights into the study of longitudinal changes in PA and health and have the potential to facilitate designing of effective health interventions and guidelines.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"17 4","pages":"3216-3240"},"PeriodicalIF":1.8,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11149895/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141249006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}