首页 > 最新文献

Annals of Statistics最新文献

英文 中文
Envelope-based sparse partial least squares 基于包络的稀疏偏最小二乘法
IF 4.5 1区 数学 Q1 Mathematics Pub Date : 2020-02-01 DOI: 10.1214/18-aos1796
G. Zhu, Zhihua Su
{"title":"Envelope-based sparse partial least squares","authors":"G. Zhu, Zhihua Su","doi":"10.1214/18-aos1796","DOIUrl":"https://doi.org/10.1214/18-aos1796","url":null,"abstract":"","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42464903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
CONSISTENT SELECTION OF THE NUMBER OF CHANGE-POINTS VIA SAMPLE-SPLITTING. 通过样本分裂一致地选择改变点的数量。
IF 4.5 1区 数学 Q1 Mathematics Pub Date : 2020-02-01 Epub Date: 2020-02-17 DOI: 10.1214/19-aos1814
Changliang Zou, Guanghui Wang, Runze Li

In multiple change-point analysis, one of the major challenges is to estimate the number of change-points. Most existing approaches attempt to minimize a Schwarz information criterion which balances a term quantifying model fit with a penalization term accounting for model complexity that increases with the number of change-points and limits overfitting. However, different penalization terms are required to adapt to different contexts of multiple change-point problems and the optimal penalization magnitude usually varies from the model and error distribution. We propose a data-driven selection criterion that is applicable to most kinds of popular change-point detection methods, including binary segmentation and optimal partitioning algorithms. The key idea is to select the number of change-points that minimizes the squared prediction error, which measures the fit of a specified model for a new sample. We develop a cross-validation estimation scheme based on an order-preserved sample-splitting strategy, and establish its asymptotic selection consistency under some mild conditions. Effectiveness of the proposed selection criterion is demonstrated on a variety of numerical experiments and real-data examples.

在多变更点分析中,主要的挑战之一是估计变更点的数量。大多数现有的方法都试图最小化Schwarz信息准则,该准则平衡了一个量化模型拟合的项和一个考虑模型复杂性的惩罚项,模型复杂性随着变化点的数量和过度拟合的限制而增加。然而,多变点问题需要不同的惩罚项来适应不同的环境,最优的惩罚大小通常随模型和误差分布而变化。我们提出了一种数据驱动的选择准则,适用于大多数流行的变化点检测方法,包括二值分割和最优分割算法。关键思想是选择变化点的数量,使预测误差的平方最小化,这是对新样本的特定模型的拟合度量。我们提出了一种基于保持序的样本分割策略的交叉验证估计方案,并在一些温和条件下建立了它的渐近选择一致性。通过各种数值实验和实际数据算例验证了所提出的选择准则的有效性。
{"title":"CONSISTENT SELECTION OF THE NUMBER OF CHANGE-POINTS VIA SAMPLE-SPLITTING.","authors":"Changliang Zou,&nbsp;Guanghui Wang,&nbsp;Runze Li","doi":"10.1214/19-aos1814","DOIUrl":"https://doi.org/10.1214/19-aos1814","url":null,"abstract":"<p><p>In multiple change-point analysis, one of the major challenges is to estimate the number of change-points. Most existing approaches attempt to minimize a Schwarz information criterion which balances a term quantifying model fit with a penalization term accounting for model complexity that increases with the number of change-points and limits overfitting. However, different penalization terms are required to adapt to different contexts of multiple change-point problems and the optimal penalization magnitude usually varies from the model and error distribution. We propose a data-driven selection criterion that is applicable to most kinds of popular change-point detection methods, including binary segmentation and optimal partitioning algorithms. The key idea is to select the number of change-points that minimizes the squared prediction error, which measures the fit of a specified model for a new sample. We develop a cross-validation estimation scheme based on an order-preserved sample-splitting strategy, and establish its asymptotic selection consistency under some mild conditions. Effectiveness of the proposed selection criterion is demonstrated on a variety of numerical experiments and real-data examples.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7397423/pdf/nihms-1022718.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38232848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
MODEL ASSISTED VARIABLE CLUSTERING: MINIMAX-OPTIMAL RECOVERY AND ALGORITHMS. 模型辅助变量聚类:最小最大最优恢复和算法。
IF 4.5 1区 数学 Q1 Mathematics Pub Date : 2020-02-01 Epub Date: 2020-02-17 DOI: 10.1214/18-aos1794
Florentina Bunea, Christophe Giraud, Xi Luo, Martin Royer, Nicolas Verzelen

The problem of variable clustering is that of estimating groups of similar components of a p-dimensional vector X = (X 1, … , X p ) from n independent copies of X. There exists a large number of algorithms that return data-dependent groups of variables, but their interpretation is limited to the algorithm that produced them. An alternative is model-based clustering, in which one begins by defining population level clusters relative to a model that embeds notions of similarity. Algorithms tailored to such models yield estimated clusters with a clear statistical interpretation. We take this view here and introduce the class of G-block covariance models as a background model for variable clustering. In such models, two variables in a cluster are deemed similar if they have similar associations will all other variables. This can arise, for instance, when groups of variables are noise corrupted versions of the same latent factor. We quantify the difficulty of clustering data generated from a G-block covariance model in terms of cluster proximity, measured with respect to two related, but different, cluster separation metrics. We derive minimax cluster separation thresholds, which are the metric values below which no algorithm can recover the model-defined clusters exactly, and show that they are different for the two metrics. We therefore develop two algorithms, COD and PECOK, tailored to G-block covariance models, and study their minimax-optimality with respect to each metric. Of independent interest is the fact that the analysis of the PECOK algorithm, which is based on a corrected convex relaxation of the popular K-means algorithm, provides the first statistical analysis of such algorithms for variable clustering. Additionally, we compare our methods with another popular clustering method, spectral clustering. Extensive simulation studies, as well as our data analyses, confirm the applicability of our approach.

变量聚类的问题是从n个X的独立副本中估计p维向量X = (x1,…,xp)的相似组件组。存在大量返回数据相关变量组的算法,但它们的解释仅限于产生它们的算法。另一种选择是基于模型的聚类,在这种方法中,首先定义相对于嵌入相似性概念的模型的总体级聚类。为这种模型量身定制的算法产生具有明确统计解释的估计聚类。我们在此采用这一观点,并引入一类g块协方差模型作为变量聚类的背景模型。在这种模型中,如果集群中的两个变量与所有其他变量具有相似的关联,则认为它们相似。例如,当一组变量是同一潜在因素的噪声破坏版本时,就会出现这种情况。我们量化了从g块协方差模型中产生的聚类数据的难度,根据两个相关但不同的聚类分离指标进行测量。我们推导了极大极小聚类分离阈值,该阈值是度量值,低于此值,任何算法都无法准确恢复模型定义的聚类,并表明它们对于两个度量是不同的。因此,我们开发了针对g块协方差模型的两种算法,COD和PECOK,并研究了它们相对于每个度量的最小最优性。值得独立关注的是,PECOK算法的分析是基于流行的K-means算法的修正凸松弛的,它为变量聚类的这类算法提供了第一个统计分析。此外,我们将我们的方法与另一种流行的聚类方法谱聚类进行了比较。广泛的模拟研究,以及我们的数据分析,证实了我们方法的适用性。
{"title":"MODEL ASSISTED VARIABLE CLUSTERING: MINIMAX-OPTIMAL RECOVERY AND ALGORITHMS.","authors":"Florentina Bunea,&nbsp;Christophe Giraud,&nbsp;Xi Luo,&nbsp;Martin Royer,&nbsp;Nicolas Verzelen","doi":"10.1214/18-aos1794","DOIUrl":"https://doi.org/10.1214/18-aos1794","url":null,"abstract":"<p><p>The problem of variable clustering is that of estimating groups of similar components of a <i>p</i>-dimensional vector <i>X</i> = (<i>X</i> <sub>1</sub>, … , <i>X</i> <sub><i>p</i></sub> ) from <i>n</i> independent copies of <i>X</i>. There exists a large number of algorithms that return data-dependent groups of variables, but their interpretation is limited to the algorithm that produced them. An alternative is model-based clustering, in which one begins by defining population level clusters relative to a model that embeds notions of similarity. Algorithms tailored to such models yield estimated clusters with a clear statistical interpretation. We take this view here and introduce the class of <i>G</i>-block covariance models as a background model for variable clustering. In such models, two variables in a cluster are deemed similar if they have similar associations will all other variables. This can arise, for instance, when groups of variables are noise corrupted versions of the same latent factor. We quantify the difficulty of clustering data generated from a <i>G</i>-block covariance model in terms of cluster proximity, measured with respect to two related, but different, cluster separation metrics. We derive minimax cluster separation thresholds, which are the metric values below which no algorithm can recover the model-defined clusters exactly, and show that they are different for the two metrics. We therefore develop two algorithms, COD and PECOK, tailored to <i>G</i>-block covariance models, and study their minimax-optimality with respect to each metric. Of independent interest is the fact that the analysis of the PECOK algorithm, which is based on a corrected convex relaxation of the popular <i>K</i>-means algorithm, provides the first statistical analysis of such algorithms for variable clustering. Additionally, we compare our methods with another popular clustering method, spectral clustering. Extensive simulation studies, as well as our data analyses, confirm the applicability of our approach.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9286061/pdf/nihms-1765231.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40532443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Detecting relevant changes in the mean of nonstationary processes—A mass excess approach 检测非平稳过程均值的相关变化——一种质量过剩方法
IF 4.5 1区 数学 Q1 Mathematics Pub Date : 2019-12-01 DOI: 10.1214/19-aos1811
H. Dette, Weichi Wu
This paper considers the problem of testing if a sequence of means (μt)t=1,...,n of a non-stationary time series (Xt)t=1,...,n is stable in the sense that the difference of the means μ1 and μt between the initial time t = 1 and any other time is smaller than a given threshold, that is |μ1 − μt| ≤ c for all t = 1, . . . , n. A test for hypotheses of this type is developed using a bias corrected monotone rearranged local linear estimator and asymptotic normality of the corresponding test statistic is established. As the asymptotic variance depends on the location of the roots of the equation |μ1 − μt| = c a new bootstrap procedure is proposed to obtain critical values and its consistency is established. As a consequence we are able to quantitatively describe relevant deviations of a non-stationary sequence from its initial value. The results are illustrated by means of a simulation study and by analyzing data examples.
本文考虑均值序列(μt)t=1,。。。,非平稳时间序列(Xt)t=1的n,。。。,在初始时间t=1和任何其他时间之间的平均值μ1和μt的差小于给定阈值的意义上,n是稳定的,即对于所有t=1,n。使用偏差校正的单调重排局部线性估计量对这类假设进行了检验,并建立了相应检验统计量的渐近正态性。由于渐近方差取决于方程|μ1−μt|=c的根的位置,提出了一种新的bootstrap程序来获得临界值,并建立了其一致性。因此,我们能够定量描述非平稳序列与其初始值的相关偏差。通过仿真研究和数据实例分析,对结果进行了说明。
{"title":"Detecting relevant changes in the mean of nonstationary processes—A mass excess approach","authors":"H. Dette, Weichi Wu","doi":"10.1214/19-aos1811","DOIUrl":"https://doi.org/10.1214/19-aos1811","url":null,"abstract":"This paper considers the problem of testing if a sequence of means (μt)t=1,...,n of a non-stationary time series (Xt)t=1,...,n is stable in the sense that the difference of the means μ1 and μt between the initial time t = 1 and any other time is smaller than a given threshold, that is |μ1 − μt| ≤ c for all t = 1, . . . , n. A test for hypotheses of this type is developed using a bias corrected monotone rearranged local linear estimator and asymptotic normality of the corresponding test statistic is established. As the asymptotic variance depends on the location of the roots of the equation |μ1 − μt| = c a new bootstrap procedure is proposed to obtain critical values and its consistency is established. As a consequence we are able to quantitatively describe relevant deviations of a non-stationary sequence from its initial value. The results are illustrated by means of a simulation study and by analyzing data examples.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43286673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Joint convergence of sample autocovariance matrices when $p/nto 0$ with application $p/n为0$时样本自协方差矩阵的联合收敛性及其应用
IF 4.5 1区 数学 Q1 Mathematics Pub Date : 2019-12-01 DOI: 10.1214/18-aos1785
M. Bhattacharjee, A. Bose
Consider a high dimensional linear time series model where the dimension p and the sample size n grow in such a way that p/n → 0. Let Γ̂u be the uth order sample autocovariance matrix. We first show that the LSD of any symmetric polynomial in {Γ̂u, Γ̂u, u ≥ 0} exists under independence and moment assumptions on the driving sequence together with weak assumptions on the coefficient matrices. This LSD result, with some additional effort, implies the asymptotic normality of the trace of any polynomial in {Γ̂u, Γ̂u, u ≥ 0}. We also study similar results for several independent MA processes. We show applications of the above results to statistical inference problems such as in estimation of the unknown order of a highdimensional MA process and in graphical and significance tests for hypotheses on coefficient matrices of one or several such independent processes.
考虑一个高维线性时间序列模型,其中维度p和样本大小n以p/n的方式增长→ 设Γu是uth阶样本自协方差矩阵。我们首先证明了在驱动序列上的独立性和矩假设以及系数矩阵上的弱假设下,{Γu,Γu≥0}中任何对称多项式的LSD都存在。这个LSD结果,加上一些额外的努力,暗示了{Γ,Γ,u≥0}中任何多项式的迹的渐近正态性。我们还研究了几个独立MA过程的类似结果。我们展示了上述结果在统计推理问题中的应用,例如在高维MA过程的未知阶的估计中,以及在一个或多个这样的独立过程的系数矩阵的假设的图形和显著性检验中。
{"title":"Joint convergence of sample autocovariance matrices when $p/nto 0$ with application","authors":"M. Bhattacharjee, A. Bose","doi":"10.1214/18-aos1785","DOIUrl":"https://doi.org/10.1214/18-aos1785","url":null,"abstract":"Consider a high dimensional linear time series model where the dimension p and the sample size n grow in such a way that p/n → 0. Let Γ̂u be the uth order sample autocovariance matrix. We first show that the LSD of any symmetric polynomial in {Γ̂u, Γ̂u, u ≥ 0} exists under independence and moment assumptions on the driving sequence together with weak assumptions on the coefficient matrices. This LSD result, with some additional effort, implies the asymptotic normality of the trace of any polynomial in {Γ̂u, Γ̂u, u ≥ 0}. We also study similar results for several independent MA processes. We show applications of the above results to statistical inference problems such as in estimation of the unknown order of a highdimensional MA process and in graphical and significance tests for hypotheses on coefficient matrices of one or several such independent processes.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46814252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
TEST FOR HIGH DIMENSIONAL CORRELATION MATRICES. 高维相关矩阵的测试。
IF 4.5 1区 数学 Q1 Mathematics Pub Date : 2019-10-01 Epub Date: 2019-08-03 DOI: 10.1214/18-AOS1768
Shurong Zheng, Guanghui Cheng, Jianhua Guo, Hongtu Zhu

Testing correlation structures has attracted extensive attention in the literature due to both its importance in real applications and several major theoretical challenges. The aim of this paper is to develop a general framework of testing correlation structures for the one-, two-, and multiple sample testing problems under a high-dimensional setting when both the sample size and data dimension go to infinity. Our test statistics are designed to deal with both the dense and sparse alternatives. We systematically investigate the asymptotic null distribution, power function, and unbiasedness of each test statistic. Theoretically, we make great efforts to deal with the non-independency of all random matrices of the sample correlation matrices. We use simulation studies and real data analysis to illustrate the versatility and practicability of our test statistics.

测试相关结构由于其在实际应用中的重要性和几个主要的理论挑战,在文献中引起了广泛的关注。本文的目的是为高维环境下的一个、两个和多个样本测试问题开发一个测试相关性结构的通用框架,当样本大小和数据维度都达到无穷大时。我们的测试统计数据旨在处理密集和稀疏的备选方案。我们系统地研究了每个检验统计量的渐近零分布、幂函数和无偏性。从理论上讲,我们努力处理样本相关矩阵的所有随机矩阵的非独立性。我们使用模拟研究和实际数据分析来说明我们的测试统计的通用性和实用性。
{"title":"TEST FOR HIGH DIMENSIONAL CORRELATION MATRICES.","authors":"Shurong Zheng,&nbsp;Guanghui Cheng,&nbsp;Jianhua Guo,&nbsp;Hongtu Zhu","doi":"10.1214/18-AOS1768","DOIUrl":"https://doi.org/10.1214/18-AOS1768","url":null,"abstract":"<p><p>Testing correlation structures has attracted extensive attention in the literature due to both its importance in real applications and several major theoretical challenges. The aim of this paper is to develop a general framework of testing correlation structures for the one-, two-, and multiple sample testing problems under a high-dimensional setting when both the sample size and data dimension go to infinity. Our test statistics are designed to deal with both the dense and sparse alternatives. We systematically investigate the asymptotic null distribution, power function, and unbiasedness of each test statistic. Theoretically, we make great efforts to deal with the non-independency of all random matrices of the sample correlation matrices. We use simulation studies and real data analysis to illustrate the versatility and practicability of our test statistics.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1214/18-AOS1768","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41189339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
EIGENVALUE DISTRIBUTIONS OF VARIANCE COMPONENTS ESTIMATORS IN HIGH-DIMENSIONAL RANDOM EFFECTS MODELS. 高维随机效应模型中方差分量估计量的特征值分布。
IF 4.5 1区 数学 Q1 Mathematics Pub Date : 2019-10-01 Epub Date: 2019-08-03 DOI: 10.1214/18-AOS1767
Fan Zhou, Iain M Johnstone

We study the spectra of MANOVA estimators for variance component covariance matrices in multivariate random effects models. When the dimensionality of the observations is large and comparable to the number of realizations of each random effect, we show that the empirical spectra of such estimators are well-approximated by deterministic laws. The Stieltjes transforms of these laws are characterized by systems of fixed-point equations, which are numerically solvable by a simple iterative procedure. Our proof uses operator-valued free probability theory, and we establish a general asymptotic freeness result for families of rectangular orthogonally-invariant random matrices, which is of independent interest. Our work is motivated in part by the estimation of components of covariance between multiple phenotypic traits in quantitative genetics, and we specialize our results to common experimental designs that arise in this application.

我们研究了多元随机效应模型中方差分量协方差矩阵的MANOVA估计量的谱。当观测的维数很大并且与每个随机效应的实现次数相当时,我们证明了这种估计量的经验谱可以很好地用确定性定律近似。这些定律的Stieltjes变换以不定点方程组为特征,这些方程组可以通过简单的迭代程序进行数值求解。我们的证明使用算子值自由概率理论,并建立了矩形正交不变随机矩阵族的一般渐近自由度结果,这是独立的。我们的工作在一定程度上是由定量遗传学中多个表型性状之间协方差分量的估计推动的,我们将我们的结果专门用于该应用中出现的常见实验设计。
{"title":"EIGENVALUE DISTRIBUTIONS OF VARIANCE COMPONENTS ESTIMATORS IN HIGH-DIMENSIONAL RANDOM EFFECTS MODELS.","authors":"Fan Zhou,&nbsp;Iain M Johnstone","doi":"10.1214/18-AOS1767","DOIUrl":"https://doi.org/10.1214/18-AOS1767","url":null,"abstract":"<p><p>We study the spectra of MANOVA estimators for variance component covariance matrices in multivariate random effects models. When the dimensionality of the observations is large and comparable to the number of realizations of each random effect, we show that the empirical spectra of such estimators are well-approximated by deterministic laws. The Stieltjes transforms of these laws are characterized by systems of fixed-point equations, which are numerically solvable by a simple iterative procedure. Our proof uses operator-valued free probability theory, and we establish a general asymptotic freeness result for families of rectangular orthogonally-invariant random matrices, which is of independent interest. Our work is motivated in part by the estimation of components of covariance between multiple phenotypic traits in quantitative genetics, and we specialize our results to common experimental designs that arise in this application.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1214/18-AOS1767","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41189338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
LINEAR HYPOTHESIS TESTING FOR HIGH DIMENSIONAL GENERALIZED LINEAR MODELS. 高维广义线性模型的线性假设检验。
IF 3.2 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2019-10-01 Epub Date: 2019-08-03 DOI: 10.1214/18-AOS1761
Chengchun Shi, Rui Song, Zhao Chen, Runze Li

This paper is concerned with testing linear hypotheses in high-dimensional generalized linear models. To deal with linear hypotheses, we first propose constrained partial regularization method and study its statistical properties. We further introduce an algorithm for solving regularization problems with folded-concave penalty functions and linear constraints. To test linear hypotheses, we propose a partial penalized likelihood ratio test, a partial penalized score test and a partial penalized Wald test. We show that the limiting null distributions of these three test statistics are χ2 distribution with the same degrees of freedom, and under local alternatives, they asymptotically follow non-central χ2 distributions with the same degrees of freedom and noncentral parameter, provided the number of parameters involved in the test hypothesis grows to ∞ at a certain rate. Simulation studies are conducted to examine the finite sample performance of the proposed tests. Empirical analysis of a real data example is used to illustrate the proposed testing procedures.

本文讨论了高维广义线性模型中线性假设的检验问题。为了处理线性假设,我们首先提出了约束部分正则化方法,并研究了其统计性质。我们进一步介绍了一种求解具有折叠凹罚函数和线性约束的正则化问题的算法。为了检验线性假设,我们提出了一个偏惩罚似然比检验,一个偏惩罚分数检验和一个偏惩罚沃尔德检验。我们证明了这三个检验统计量的极限零分布都是相同自由度的χ2分布,并且在局部选择下,只要检验假设涉及的参数个数以一定的速率增长到∞,它们都渐近服从相同自由度和非中心参数的非中心χ2分布。进行了仿真研究,以检验所提出的测试的有限样本性能。通过一个实际数据实例的实证分析,说明了所提出的测试方法。
{"title":"LINEAR HYPOTHESIS TESTING FOR HIGH DIMENSIONAL GENERALIZED LINEAR MODELS.","authors":"Chengchun Shi, Rui Song, Zhao Chen, Runze Li","doi":"10.1214/18-AOS1761","DOIUrl":"10.1214/18-AOS1761","url":null,"abstract":"<p><p>This paper is concerned with testing linear hypotheses in high-dimensional generalized linear models. To deal with linear hypotheses, we first propose constrained partial regularization method and study its statistical properties. We further introduce an algorithm for solving regularization problems with folded-concave penalty functions and linear constraints. To test linear hypotheses, we propose a partial penalized likelihood ratio test, a partial penalized score test and a partial penalized Wald test. We show that the limiting null distributions of these three test statistics are χ<sup>2</sup> distribution with the same degrees of freedom, and under local alternatives, they asymptotically follow non-central χ<sup>2</sup> distributions with the same degrees of freedom and noncentral parameter, provided the number of parameters involved in the test hypothesis grows to ∞ at a certain rate. Simulation studies are conducted to examine the finite sample performance of the proposed tests. Empirical analysis of a real data example is used to illustrate the proposed testing procedures.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6750760/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48392668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ON TESTING CONDITIONAL QUALITATIVE TREATMENT EFFECTS. 关于测试有条件的定性治疗效果。
IF 4.5 1区 数学 Q1 Mathematics Pub Date : 2019-08-01 Epub Date: 2019-05-21 DOI: 10.1214/18-AOS1750
Chengchun Shi, Rui Song, Wenbin Lu

Precision medicine is an emerging medical paradigm that focuses on finding the most effective treatment strategy tailored for individual patients. In the literature, most of the existing works focused on estimating the optimal treatment regime. However, there has been less attention devoted to hypothesis testing regarding the optimal treatment regime. In this paper, we first introduce the notion of conditional qualitative treatment effects (CQTE) of a set of variables given another set of variables and provide a class of equivalent representations for the null hypothesis of no CQTE. The proposed definition of CQTE does not assume any parametric form for the optimal treatment rule and plays an important role for assessing the incremental value of a set of new variables in optimal treatment decision making conditional on an existing set of prescriptive variables. We then propose novel testing procedures for no CQTE based on kernel estimation of the conditional contrast functions. We show that our test statistics have asymptotically correct size and non-negligible power against some nonstandard local alternatives. The empirical performance of the proposed tests are evaluated by simulations and an application to an AIDS data set.

精准医学是一种新兴的医学范式,专注于为个体患者找到最有效的治疗策略。在文献中,大多数现有的工作都集中在估计最佳治疗方案上。然而,对最佳治疗方案的假设检验关注较少。在本文中,我们首先引入了一组变量给定另一组变量的条件定性处理效应(CQTE)的概念,并为无CQTE的零假设提供了一类等价表示。所提出的CQTE定义不采用最优治疗规则的任何参数形式,并且在以现有的一组规定变量为条件的最优治疗决策中,在评估一组新变量的增量方面发挥着重要作用。然后,我们基于条件对比度函数的核估计,提出了新的无CQTE测试程序。我们证明了我们的检验统计量具有渐近正确的大小,并且相对于一些非标准局部替代方案具有不可忽略的幂。通过模拟和对艾滋病数据集的应用来评估所提出的测试的经验性能。
{"title":"ON TESTING CONDITIONAL QUALITATIVE TREATMENT EFFECTS.","authors":"Chengchun Shi,&nbsp;Rui Song,&nbsp;Wenbin Lu","doi":"10.1214/18-AOS1750","DOIUrl":"10.1214/18-AOS1750","url":null,"abstract":"<p><p>Precision medicine is an emerging medical paradigm that focuses on finding the most effective treatment strategy tailored for individual patients. In the literature, most of the existing works focused on estimating the optimal treatment regime. However, there has been less attention devoted to hypothesis testing regarding the optimal treatment regime. In this paper, we first introduce the notion of conditional qualitative treatment effects (CQTE) of a set of variables given another set of variables and provide a class of equivalent representations for the null hypothesis of no CQTE. The proposed definition of CQTE does not assume any parametric form for the optimal treatment rule and plays an important role for assessing the incremental value of a set of new variables in optimal treatment decision making conditional on an existing set of prescriptive variables. We then propose novel testing procedures for no CQTE based on kernel estimation of the conditional contrast functions. We show that our test statistics have asymptotically correct size and non-negligible power against some nonstandard local alternatives. The empirical performance of the proposed tests are evaluated by simulations and an application to an AIDS data set.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1214/18-AOS1750","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37047929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A ROBUST AND EFFICIENT APPROACH TO CAUSAL INFERENCE BASED ON SPARSE SUFFICIENT DIMENSION REDUCTION. 一种基于稀疏充分降维的稳健有效的因果推理方法。
IF 4.5 1区 数学 Q1 Mathematics Pub Date : 2019-06-01 Epub Date: 2019-02-13 DOI: 10.1214/18-AOS1722
Shujie Ma, Liping Zhu, Zhiwei Zhang, Chih-Ling Tsai, Raymond J Carroll

A fundamental assumption used in causal inference with observational data is that treatment assignment is ignorable given measured confounding variables. This assumption of no missing confounders is plausible if a large number of baseline covariates are included in the analysis, as we often have no prior knowledge of which variables can be important confounders. Thus, estimation of treatment effects with a large number of covariates has received considerable attention in recent years. Most existing methods require specifying certain parametric models involving the outcome, treatment and confounding variables, and employ a variable selection procedure to identify confounders. However, selection of a proper set of confounders depends on correct specification of the working models. The bias due to model misspecification and incorrect selection of confounding variables can yield misleading results. We propose a robust and efficient approach for inference about the average treatment effect via a flexible modeling strategy incorporating penalized variable selection. Specifically, we consider an estimator constructed based on an efficient influence function that involves a propensity score and an outcome regression. We then propose a new sparse sufficient dimension reduction method to estimate these two functions without making restrictive parametric modeling assumptions. The proposed estimator of the average treatment effect is asymptotically normal and semiparametrically efficient without the need for variable selection consistency. The proposed methods are illustrated via simulation studies and a biomedical application.

在观察数据的因果推断中使用的一个基本假设是,给定测量的混杂变量,治疗分配是可以忽略的。如果分析中包括大量的基线协变量,这种没有遗漏混杂因素的假设是合理的,因为我们通常不知道哪些变量可能是重要的混杂因素。因此,近年来,用大量协变量估计治疗效果受到了相当大的关注。大多数现有方法需要指定涉及结果、治疗和混杂变量的某些参数模型,并采用变量选择程序来识别混杂因素。然而,一组合适的混杂因素的选择取决于工作模型的正确规范。由于模型的错误指定和混杂变量的错误选择而产生的偏差可能会产生误导性的结果。我们提出了一种稳健有效的方法,通过结合惩罚变量选择的灵活建模策略来推断平均治疗效果。具体来说,我们考虑一个基于有效影响函数构建的估计器,该函数涉及倾向得分和结果回归。然后,我们提出了一种新的稀疏充分降维方法来估计这两个函数,而不需要进行限制性的参数建模假设。所提出的平均治疗效果的估计量是渐近正态的和半参数有效的,不需要变量选择一致性。通过仿真研究和生物医学应用对所提出的方法进行了说明。
{"title":"A ROBUST AND EFFICIENT APPROACH TO CAUSAL INFERENCE BASED ON SPARSE SUFFICIENT DIMENSION REDUCTION.","authors":"Shujie Ma,&nbsp;Liping Zhu,&nbsp;Zhiwei Zhang,&nbsp;Chih-Ling Tsai,&nbsp;Raymond J Carroll","doi":"10.1214/18-AOS1722","DOIUrl":"10.1214/18-AOS1722","url":null,"abstract":"<p><p>A fundamental assumption used in causal inference with observational data is that treatment assignment is ignorable given measured confounding variables. This assumption of no missing confounders is plausible if a large number of baseline covariates are included in the analysis, as we often have no prior knowledge of which variables can be important confounders. Thus, estimation of treatment effects with a large number of covariates has received considerable attention in recent years. Most existing methods require specifying certain parametric models involving the outcome, treatment and confounding variables, and employ a variable selection procedure to identify confounders. However, selection of a proper set of confounders depends on correct specification of the working models. The bias due to model misspecification and incorrect selection of confounding variables can yield misleading results. We propose a robust and efficient approach for inference about the average treatment effect via a flexible modeling strategy incorporating penalized variable selection. Specifically, we consider an estimator constructed based on an efficient influence function that involves a propensity score and an outcome regression. We then propose a new sparse sufficient dimension reduction method to estimate these two functions without making restrictive parametric modeling assumptions. The proposed estimator of the average treatment effect is asymptotically normal and semiparametrically efficient without the need for variable selection consistency. The proposed methods are illustrated via simulation studies and a biomedical application.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1214/18-AOS1722","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37359979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
期刊
Annals of Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1