Pub Date : 2025-12-01Epub Date: 2025-12-22DOI: 10.1214/25-aos2519
Raaz Dwivedi, Katherine Tian, Sabina Tomkins, Predrag Klasnja, Susan Murphy, Devavrat Shah
We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale-mean outcome under different treatments for each unit and each time-with minimal assumptions on the adaptive treatment policy. Without any structural assumptions on the counterfactual means, this challenging task is infeasible due to more unknowns than observed data points. To make progress, we introduce a latent factor model over the counterfactual means that serves as a non-parametric generalization of the non-linear mixed effects model and the bilinear latent factor model considered in prior works. For estimation, we use a non-parametric method, namely a variant of nearest neighbors, and establish a non-asymptotic high probability error bound for the counterfactual mean for each unit and each time. Under regularity conditions, this bound leads to asymptotically valid confidence intervals for the counterfactual mean as the number of units and time points grows to together at suitable rates. We illustrate our theory via several simulations and a case study involving data from a mobile health clinical trial HeartSteps.
{"title":"COUNTERFACTUAL INFERENCE IN SEQUENTIAL EXPERIMENTS.","authors":"Raaz Dwivedi, Katherine Tian, Sabina Tomkins, Predrag Klasnja, Susan Murphy, Devavrat Shah","doi":"10.1214/25-aos2519","DOIUrl":"10.1214/25-aos2519","url":null,"abstract":"<p><p>We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale-mean outcome under different treatments <i>for each unit and each time</i>-with minimal assumptions on the adaptive treatment policy. Without any structural assumptions on the counterfactual means, this challenging task is infeasible due to more unknowns than observed data points. To make progress, we introduce a latent factor model over the counterfactual means that serves as a non-parametric generalization of the non-linear mixed effects model and the bilinear latent factor model considered in prior works. For estimation, we use a non-parametric method, namely a variant of nearest neighbors, and establish a non-asymptotic high probability error bound for the counterfactual mean for each unit and each time. Under regularity conditions, this bound leads to asymptotically valid confidence intervals for the counterfactual mean as the number of units and time points grows to <math><mo>∞</mo></math> together at suitable rates. We illustrate our theory via several simulations and a case study involving data from a mobile health clinical trial HeartSteps.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"53 6","pages":"2380-2406"},"PeriodicalIF":3.7,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758907/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145899144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Offline reinforcement learning (RL) aims to find optimal policies in dynamic environments in order to maximize the expected total rewards by leveraging pre-collected data. Learning from heterogeneous data is one of the fundamental challenges in offline RL. Traditional methods focus on learning an optimal policy for all individuals with pre-collected data from a single episode or homogeneous batch episodes, and thus, may result in a suboptimal policy for a heterogeneous population. In this paper, we propose an individualized offline policy optimization framework for heterogeneous time-stationary Markov decision processes (MDPs). The proposed heterogeneous model with individual latent variables enables us to efficiently estimate the individual Q-functions, and our Penalized Pessimistic Personalized Policy Learning (P4L) algorithm guarantees a fast rate on the average regret under a weak partial coverage assumption on behavior policies. In addition, our simulation studies and a real data application demonstrate the superior numerical performance of the proposed method compared with existing methods.
{"title":"REINFORCEMENT LEARNING FOR INDIVIDUAL OPTIMAL POLICY FROM HETEROGENEOUS DATA.","authors":"By Rui Miao, Babak Shahbaba, Annie Qu","doi":"10.1214/25-aos2512","DOIUrl":"10.1214/25-aos2512","url":null,"abstract":"<p><p>Offline reinforcement learning (RL) aims to find optimal policies in dynamic environments in order to maximize the expected total rewards by leveraging pre-collected data. Learning from heterogeneous data is one of the fundamental challenges in offline RL. Traditional methods focus on learning an optimal policy for all individuals with pre-collected data from a single episode or homogeneous batch episodes, and thus, may result in a suboptimal policy for a heterogeneous population. In this paper, we propose an individualized offline policy optimization framework for heterogeneous time-stationary Markov decision processes (MDPs). The proposed heterogeneous model with individual latent variables enables us to efficiently estimate the individual Q-functions, and our Penalized Pessimistic Personalized Policy Learning (P4L) algorithm guarantees a fast rate on the average regret under a weak partial coverage assumption on behavior policies. In addition, our simulation studies and a real data application demonstrate the superior numerical performance of the proposed method compared with existing methods.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"53 4","pages":"1513-1534"},"PeriodicalIF":3.7,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12439830/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145079494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01Epub Date: 2025-02-13DOI: 10.1214/24-aos2457
Satarupa Bhattacharjee, Bing Li, Lingzhou Xue
Random objects are complex non-Euclidean data taking values in general metric spaces, possibly devoid of any underlying vector space structure. Such data are becoming increasingly abundant with the rapid advancement in technology. Examples include probability distributions, positive semidefinite matrices and data on Riemannian manifolds. However, except for regression for object-valued response with Euclidean predictors and distribution-on-distribution regression, there has been limited development of a general framework for object-valued response with object-valued predictors in the literature. To fill this gap, we introduce the notion of a weak conditional Fréchet mean based on Carleman operators and then propose a global nonlinear Fréchet regression model through the reproducing kernel Hilbert space (RKHS) embedding. Furthermore, we establish the relationships between the conditional Fréchet mean and the weak conditional Fréchet mean for both Euclidean and object-valued data. We also show that the state-of-the-art global Fréchet regression developed by Petersen and Müller (Ann. Statist. 47 (2019) 691-719) emerges as a special case of our method by choosing a linear kernel. We require that the metric space for the predictor admits a reproducing kernel, while the intrinsic geometry of the metric space for the response is utilized to study the asymptotic properties of the proposed estimates. Numerical studies, including extensive simulations and a real application, are conducted to investigate the finite-sample performance.
{"title":"NONLINEAR GLOBAL FRÉCHET REGRESSION FOR RANDOM OBJECTS VIA WEAK CONDITIONAL EXPECTATION.","authors":"Satarupa Bhattacharjee, Bing Li, Lingzhou Xue","doi":"10.1214/24-aos2457","DOIUrl":"10.1214/24-aos2457","url":null,"abstract":"<p><p>Random objects are complex non-Euclidean data taking values in general metric spaces, possibly devoid of any underlying vector space structure. Such data are becoming increasingly abundant with the rapid advancement in technology. Examples include probability distributions, positive semidefinite matrices and data on Riemannian manifolds. However, except for regression for object-valued response with Euclidean predictors and distribution-on-distribution regression, there has been limited development of a general framework for object-valued response with object-valued predictors in the literature. To fill this gap, we introduce the notion of a weak conditional Fréchet mean based on Carleman operators and then propose a global nonlinear Fréchet regression model through the reproducing kernel Hilbert space (RKHS) embedding. Furthermore, we establish the relationships between the conditional Fréchet mean and the weak conditional Fréchet mean for both Euclidean and object-valued data. We also show that the state-of-the-art global Fréchet regression developed by Petersen and Müller (<i>Ann</i>. <i>Statist</i>. <b>47</b> (2019) 691-719) emerges as a special case of our method by choosing a linear kernel. We require that the metric space for the predictor admits a reproducing kernel, while the intrinsic geometry of the metric space for the response is utilized to study the asymptotic properties of the proposed estimates. Numerical studies, including extensive simulations and a real application, are conducted to investigate the finite-sample performance.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"53 1","pages":"117-143"},"PeriodicalIF":3.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12407180/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144999566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Since ChatGPT was introduced in November 2022, embedding (nearly) unnoticeable statistical signals into text generated by large language models (LLMs), also known as watermarking, has been used as a principled approach to provable detection of LLM-generated text from its human-written counterpart. In this paper, we introduce a general and flexible framework for reasoning about the statistical efficiency of watermarks and designing powerful detection rules. Inspired by the hypothesis testing formulation of watermark detection, our framework starts by selecting a pivotal statistic of the text and a secret key-provided by the LLM to the verifier-to control the false positive rate (the error of mistakenly detecting human-written text as LLM-generated). Next, this framework allows one to evaluate the power of watermark detection rules by obtaining a closed-form expression of the asymptotic false negative rate (the error of incorrectly classifying LLM-generated text as human-written). Our framework further reduces the problem of determining the optimal detection rule to solving a minimax optimization program. We apply this framework to two representative watermarks-one of which has been internally implemented at OpenAI-and obtain several findings that can be instrumental in guiding the practice of implementing watermarks. In particular, we derive optimal detection rules for these watermarks under our framework. These theoretically derived detection rules are demonstrated to be competitive and sometimes enjoy a higher power than existing detection approaches through numerical experiments.
{"title":"A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules.","authors":"Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, Weijie J Su","doi":"10.1214/24-aos2468","DOIUrl":"10.1214/24-aos2468","url":null,"abstract":"<p><p>Since ChatGPT was introduced in November 2022, embedding (nearly) unnoticeable statistical signals into text generated by large language models (LLMs), also known as watermarking, has been used as a principled approach to provable detection of LLM-generated text from its human-written counterpart. In this paper, we introduce a general and flexible framework for reasoning about the statistical efficiency of watermarks and designing powerful detection rules. Inspired by the hypothesis testing formulation of watermark detection, our framework starts by selecting a pivotal statistic of the text and a secret key-provided by the LLM to the verifier-to control the false positive rate (the error of mistakenly detecting human-written text as LLM-generated). Next, this framework allows one to evaluate the power of watermark detection rules by obtaining a closed-form expression of the asymptotic false negative rate (the error of incorrectly classifying LLM-generated text as human-written). Our framework further reduces the problem of determining the optimal detection rule to solving a minimax optimization program. We apply this framework to two representative watermarks-one of which has been internally implemented at OpenAI-and obtain several findings that can be instrumental in guiding the practice of implementing watermarks. In particular, we derive optimal detection rules for these watermarks under our framework. These theoretically derived detection rules are demonstrated to be competitive and sometimes enjoy a higher power than existing detection approaches through numerical experiments.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"53 1","pages":"322-351"},"PeriodicalIF":3.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12467635/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145184574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-08-11DOI: 10.1214/24-aos2378
Bingxin Zhao, Shurong Zheng, Hongtu Zhu
Genetic prediction holds immense promise for translating genetic discoveries into medical advances. As the high-dimensional covariance matrix (or the linkage disequilibrium (LD) pattern) of genetic variants often presents a block-diagonal structure, numerous methods account for the dependence among variants in predetermined local LD blocks. Moreover, due to privacy considerations and data protection concerns, genetic variant dependence in each LD block is typically estimated from external reference panels rather than the original training data set. This paper presents a unified analysis of blockwise and reference panel-based estimators in a high-dimensional prediction framework without sparsity restrictions. We find that, surprisingly, even when the covariance matrix has a block-diagonal structure with well-defined boundaries, blockwise estimation methods adjusting for local dependence can be substantially less accurate than methods controlling for the whole covariance matrix. Further, estimation methods built on the original training data set and external reference panels are likely to have varying performance in high dimensions, which may reflect the cost of having only access to summary level data from the training data set. This analysis is based on novel results in random matrix theory for block-diagonal covariance matrix. We numerically evaluate our results using extensive simulations and real data analysis in the UK Biobank.
{"title":"ON BLOCKWISE AND REFERENCE PANEL-BASED ESTIMATORS FOR GENETIC DATA PREDICTION IN HIGH DIMENSIONS.","authors":"Bingxin Zhao, Shurong Zheng, Hongtu Zhu","doi":"10.1214/24-aos2378","DOIUrl":"10.1214/24-aos2378","url":null,"abstract":"<p><p>Genetic prediction holds immense promise for translating genetic discoveries into medical advances. As the high-dimensional covariance matrix (or the linkage disequilibrium (LD) pattern) of genetic variants often presents a block-diagonal structure, numerous methods account for the dependence among variants in predetermined local LD blocks. Moreover, due to privacy considerations and data protection concerns, genetic variant dependence in each LD block is typically estimated from external reference panels rather than the original training data set. This paper presents a unified analysis of blockwise and reference panel-based estimators in a high-dimensional prediction framework without sparsity restrictions. We find that, surprisingly, even when the covariance matrix has a block-diagonal structure with well-defined boundaries, blockwise estimation methods adjusting for local dependence can be substantially less accurate than methods controlling for the whole covariance matrix. Further, estimation methods built on the original training data set and external reference panels are likely to have varying performance in high dimensions, which may reflect the cost of having only access to summary level data from the training data set. This analysis is based on novel results in random matrix theory for block-diagonal covariance matrix. We numerically evaluate our results using extensive simulations and real data analysis in the UK Biobank.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"52 3","pages":"948-965"},"PeriodicalIF":3.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11391480/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142279682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01Epub Date: 2024-05-09DOI: 10.1214/24-aos2369
Edward H Kennedy, Sivaraman Balakrishnan, James M Robins, Larry Wasserman
Estimation of heterogeneous causal effects - i.e., how effects of policies and treatments vary across subjects - is a fundamental task in causal inference. Many methods for estimating conditional average treatment effects (CATEs) have been proposed in recent years, but questions surrounding optimality have remained largely unanswered. In particular, a minimax theory of optimality has yet to be developed, with the minimax rate of convergence and construction of rate-optimal estimators remaining open problems. In this paper we derive the minimax rate for CATE estimation, in a Hölder-smooth nonparametric model, and present a new local polynomial estimator, giving high-level conditions under which it is minimax optimal. Our minimax lower bound is derived via a localized version of the method of fuzzy hypotheses, combining lower bound constructions for nonparametric regression and functional estimation. Our proposed estimator can be viewed as a local polynomial R-Learner, based on a localized modification of higher-order influence function methods. The minimax rate we find exhibits several interesting features, including a non-standard elbow phenomenon and an unusual interpolation between nonparametric regression and functional estimation rates. The latter quantifies how the CATE, as an estimand, can be viewed as a regression/functional hybrid.
估算异质性因果效应--即政策和治疗方法的效应如何在不同受试者之间发生变化--是因果推断中的一项基本任务。近年来,人们提出了许多估计条件平均治疗效果(CATE)的方法,但围绕最优性的问题在很大程度上仍未得到解答。特别是,关于最优性的最小理论尚待发展,最小收敛率和最优率估计器的构建仍是悬而未决的问题。在本文中,我们在一个荷尔德平滑非参数模型中推导出了 CATE 估计的最小率,并提出了一个新的局部多项式估计器,给出了它是最小最优估计器的高级条件。我们的最小值下界是通过模糊假设方法的本地化版本推导出来的,结合了非参数回归和函数估计的下界构造。我们提出的估计器可以看作是基于高阶影响函数方法局部修正的局部多项式 R 学习器。我们发现的最小率具有几个有趣的特征,包括非标准的肘部现象和非参数回归与函数估计率之间不寻常的插值。后者量化了作为估算对象的 CATE 如何被视为回归/函数混合体。
{"title":"Minimax rates for heterogeneous causal effect estimation.","authors":"Edward H Kennedy, Sivaraman Balakrishnan, James M Robins, Larry Wasserman","doi":"10.1214/24-aos2369","DOIUrl":"10.1214/24-aos2369","url":null,"abstract":"<p><p>Estimation of heterogeneous causal effects - i.e., how effects of policies and treatments vary across subjects - is a fundamental task in causal inference. Many methods for estimating conditional average treatment effects (CATEs) have been proposed in recent years, but questions surrounding optimality have remained largely unanswered. In particular, a minimax theory of optimality has yet to be developed, with the minimax rate of convergence and construction of rate-optimal estimators remaining open problems. In this paper we derive the minimax rate for CATE estimation, in a Hölder-smooth nonparametric model, and present a new local polynomial estimator, giving high-level conditions under which it is minimax optimal. Our minimax lower bound is derived via a localized version of the method of fuzzy hypotheses, combining lower bound constructions for nonparametric regression and functional estimation. Our proposed estimator can be viewed as a local polynomial R-Learner, based on a localized modification of higher-order influence function methods. The minimax rate we find exhibits several interesting features, including a non-standard elbow phenomenon and an unusual interpolation between nonparametric regression and functional estimation rates. The latter quantifies how the CATE, as an estimand, can be viewed as a regression/functional hybrid.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"52 2","pages":"793-816"},"PeriodicalIF":3.2,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11960818/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143762600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-01Epub Date: 2024-03-07DOI: 10.1214/23-aos2339
Yeqing Zhou, Kai Xu, Liping Zhu, Runze Li
To test independence between two high-dimensional random vectors, we propose three tests based on the rank-based indices derived from Hoeffding's , Blum-Kiefer-Rosenblatt's and Bergsma-Dassios-Yanagimoto's . Under the null hypothesis of independence, we show that the distributions of the proposed test statistics converge to normal ones if the dimensions diverge arbitrarily with the sample size. We further derive an explicit rate of convergence. Thanks to the monotone transformation-invariant property, these distribution-free tests can be readily used to generally distributed random vectors including heavily tailed ones. We further study the local power of the proposed tests and compare their relative efficiencies with two classic distance covariance/correlation based tests in high dimensional settings. We establish explicit relationships between and Pearson's correlation for bivariate normal random variables. The relationships serve as a basis for power comparison. Our theoretical results show that under a Gaussian equicorrelation alternative, (i) the proposed tests are superior to the two classic distance covariance/correlation based tests if the components of random vectors have very different scales; (ii) the asymptotic efficiency of the proposed tests based on and are sorted in a descending order.
为了检验两个高维随机向量之间的独立性,我们提出了三种检验方法,分别基于从霍夫丁的 D、布卢姆-基弗-罗森布拉特的 R 和贝格斯马-达西奥斯-扬纳基莫托的τ* 得出的基于秩的指数。在独立性的零假设下,我们证明了如果维数随样本量任意发散,所提出的检验统计量的分布会收敛到正态分布。我们进一步推导出了明确的收敛率。得益于单调变换不变的特性,这些无分布检验可以很容易地用于一般分布的随机向量,包括重尾向量。我们进一步研究了所提出检验的局部功率,并比较了它们与两种基于距离协方差/相关性的经典检验在高维环境下的相对效率。我们在双变量正态随机变量的 D、R、τ* 和皮尔逊相关性之间建立了明确的关系。这些关系可作为功率比较的基础。我们的理论结果表明,在高斯等相关性替代条件下,(i) 如果随机向量的分量具有非常不同的尺度,所提出的检验优于基于距离协方差/相关性的两种经典检验;(ii) 基于 D、τ* 和 R 所提出的检验的渐进效率按降序排列。
{"title":"RANK-BASED INDICES FOR TESTING INDEPENDENCE BETWEEN TWO HIGH-DIMENSIONAL VECTORS.","authors":"Yeqing Zhou, Kai Xu, Liping Zhu, Runze Li","doi":"10.1214/23-aos2339","DOIUrl":"10.1214/23-aos2339","url":null,"abstract":"<p><p>To test independence between two high-dimensional random vectors, we propose three tests based on the rank-based indices derived from Hoeffding's <math><mi>D</mi></math>, Blum-Kiefer-Rosenblatt's <math><mi>R</mi></math> and Bergsma-Dassios-Yanagimoto's <math><msup><mrow><mi>τ</mi></mrow><mrow><mo>*</mo></mrow></msup></math>. Under the null hypothesis of independence, we show that the distributions of the proposed test statistics converge to normal ones if the dimensions diverge arbitrarily with the sample size. We further derive an explicit rate of convergence. Thanks to the monotone transformation-invariant property, these distribution-free tests can be readily used to generally distributed random vectors including heavily tailed ones. We further study the local power of the proposed tests and compare their relative efficiencies with two classic distance covariance/correlation based tests in high dimensional settings. We establish explicit relationships between <math><mi>D</mi><mo>,</mo><mi>R</mi><mo>,</mo><msup><mrow><mi>τ</mi></mrow><mrow><mo>*</mo></mrow></msup></math> and Pearson's correlation for bivariate normal random variables. The relationships serve as a basis for power comparison. Our theoretical results show that under a Gaussian equicorrelation alternative, (i) the proposed tests are superior to the two classic distance covariance/correlation based tests if the components of random vectors have very different scales; (ii) the asymptotic efficiency of the proposed tests based on <math><mi>D</mi><mo>,</mo><msup><mrow><mi>τ</mi></mrow><mrow><mo>*</mo></mrow></msup></math> and <math><mi>R</mi></math> are sorted in a descending order.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"52 1","pages":"184-206"},"PeriodicalIF":3.2,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11064990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140849012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-01Epub Date: 2024-03-07DOI: 10.1214/23-aos2347
Wen Wang, Shihao Wu, Ziwei Zhu, Ling Zhou, Peter X-K Song
Fusing regression coefficients into homogeneous groups can unveil those coefficients that share a common value within each group. Such groupwise homogeneity reduces the intrinsic dimension of the parameter space and unleashes sharper statistical accuracy. We propose and investigate a new combinatorial grouping approach called -Fusion that is amenable to mixed integer optimization (MIO). On the statistical aspect, we identify a fundamental quantity called MSE grouping sensitivity that underpins the difficulty of recovering the true groups. We show that -Fusion achieves grouping consistency under the weakest possible requirement of the grouping sensitivity: if this requirement is violated, then the minimax risk of group misspecification will fail to converge to zero. Moreover, we show that in the high-dimensional regime, one can apply -Fusion with a sure screening set of features without any essential loss of statistical efficiency, while reducing the computational cost substantially. On the algorithmic aspect, we provide an MIO formulation for -Fusion along with a warm start strategy. Simulation and real data analysis demonstrate that -Fusion exhibits superiority over its competitors in terms of grouping accuracy.
{"title":"SUPERVISED HOMOGENEITY FUSION: A COMBINATORIAL APPROACH.","authors":"Wen Wang, Shihao Wu, Ziwei Zhu, Ling Zhou, Peter X-K Song","doi":"10.1214/23-aos2347","DOIUrl":"10.1214/23-aos2347","url":null,"abstract":"<p><p>Fusing regression coefficients into homogeneous groups can unveil those coefficients that share a common value within each group. Such groupwise homogeneity reduces the intrinsic dimension of the parameter space and unleashes sharper statistical accuracy. We propose and investigate a new combinatorial grouping approach called <math> <msub><mrow><mi>L</mi></mrow> <mrow><mn>0</mn></mrow> </msub> </math> -Fusion that is amenable to mixed integer optimization (MIO). On the statistical aspect, we identify a fundamental quantity called <i>MSE grouping sensitivity</i> that underpins the difficulty of recovering the true groups. We show that <math> <msub><mrow><mi>L</mi></mrow> <mrow><mn>0</mn></mrow> </msub> </math> -Fusion achieves grouping consistency under the weakest possible requirement of the grouping sensitivity: if this requirement is violated, then the minimax risk of group misspecification will fail to converge to zero. Moreover, we show that in the high-dimensional regime, one can apply <math> <msub><mrow><mi>L</mi></mrow> <mrow><mn>0</mn></mrow> </msub> </math> -Fusion with a sure screening set of features without any essential loss of statistical efficiency, while reducing the computational cost substantially. On the algorithmic aspect, we provide an MIO formulation for <math> <msub><mrow><mi>L</mi></mrow> <mrow><mn>0</mn></mrow> </msub> </math> -Fusion along with a warm start strategy. Simulation and real data analysis demonstrate that <math> <msub><mrow><mi>L</mi></mrow> <mrow><mn>0</mn></mrow> </msub> </math> -Fusion exhibits superiority over its competitors in terms of grouping accuracy.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"52 1","pages":"285-310"},"PeriodicalIF":3.7,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12327361/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144793305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The effect of the order in which a set of m treatments is applied can be modeled by relative-position factors that indicate whether treatment i is carried out before or after treatment j, or by the absolute position for treatment i in the sequence. A design with the same normalized information matrix as the design with all m! sequences is D- and G-optimal for the main-effects model involving the relative-position factors. We prove that such designs are also I-optimal for this model and D-optimal as well as G- and I-optimal for the first-order model in the absolute-position factors. We propose a methodology for a complete or partial enumeration of nonequivalent designs that are optimal for both models.
{"title":"Order-of-addition orthogonal arrays to study the effect of treatment ordering","authors":"Eric D. Schoen, Robert W. Mee","doi":"10.1214/23-aos2317","DOIUrl":"https://doi.org/10.1214/23-aos2317","url":null,"abstract":"The effect of the order in which a set of m treatments is applied can be modeled by relative-position factors that indicate whether treatment i is carried out before or after treatment j, or by the absolute position for treatment i in the sequence. A design with the same normalized information matrix as the design with all m! sequences is D- and G-optimal for the main-effects model involving the relative-position factors. We prove that such designs are also I-optimal for this model and D-optimal as well as G- and I-optimal for the first-order model in the absolute-position factors. We propose a methodology for a complete or partial enumeration of nonequivalent designs that are optimal for both models.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135055038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For two correlated graphs which are independently sub-sampled from a common Erdős–Rényi graph G(n,p), we wish to recover their latent vertex matching from the observation of these two graphs without labels. When p=n−α+o(1) for α∈(0,1], we establish a sharp information-theoretic threshold for whether it is possible to correctly match a positive fraction of vertices. Our result sharpens a constant factor in a recent work by Wu, Xu and Yu.
{"title":"Matching recovery threshold for correlated random graphs","authors":"Jian Ding, Hang Du","doi":"10.1214/23-aos2305","DOIUrl":"https://doi.org/10.1214/23-aos2305","url":null,"abstract":"For two correlated graphs which are independently sub-sampled from a common Erdős–Rényi graph G(n,p), we wish to recover their latent vertex matching from the observation of these two graphs without labels. When p=n−α+o(1) for α∈(0,1], we establish a sharp information-theoretic threshold for whether it is possible to correctly match a positive fraction of vertices. Our result sharpens a constant factor in a recent work by Wu, Xu and Yu.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135055279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}