In this paper, we derive the limit of experiments for one parameter Ising models on dense regular graphs. In particular, we show that the limiting experiment is Gaussian in the low temperature regime, non Gaussian in the critical regime, and an infinite collection of Gaussians in the high temperature regime. We also derive the limiting distributions of the maximum likelihood and maximum pseudo-likelihood estimators, and study limiting power for tests of hypothesis against contiguous alternatives (whose scaling changes across the regimes). To the best of our knowledge, this is the first attempt at establishing the classical limits of experiments for Ising models (and more generally, Markov random fields).
{"title":"Inference in Ising models on dense regular graphs","authors":"Yuanzhe Xu, S. Mukherjee","doi":"10.1214/23-aos2286","DOIUrl":"https://doi.org/10.1214/23-aos2286","url":null,"abstract":"In this paper, we derive the limit of experiments for one parameter Ising models on dense regular graphs. In particular, we show that the limiting experiment is Gaussian in the low temperature regime, non Gaussian in the critical regime, and an infinite collection of Gaussians in the high temperature regime. We also derive the limiting distributions of the maximum likelihood and maximum pseudo-likelihood estimators, and study limiting power for tests of hypothesis against contiguous alternatives (whose scaling changes across the regimes). To the best of our knowledge, this is the first attempt at establishing the classical limits of experiments for Ising models (and more generally, Markov random fields).","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"80 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82979344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In high-dimensional classification problems, a commonly used approach is to first project the high-dimensional features into a lower dimensional space, and base the classification on the resulting lower dimensional projections. In this paper, we formulate a latent-variable model with a hidden low-dimensional structure to justify this two-step procedure and to guide which projection to choose. We propose a computationally efficient classifier that takes certain principal components (PCs) of the observed features as projections, with the number of retained PCs selected in a data-driven way. A general theory is established for analyzing such two-step classifiers based on any projections. We derive explicit rates of convergence of the excess risk of the proposed PC-based classifier. The obtained rates are further shown to be optimal up to logarithmic factors in the minimax sense. Our theory allows the lower-dimension to grow with the sample size and is also valid even when the feature dimension (greatly) exceeds the sample size. Extensive simulations corroborate our theoretical findings. The proposed method also performs favorably relative to other existing discriminant methods on three real data examples.
{"title":"Optimal discriminant analysis in high-dimensional latent factor models","authors":"Xin Bing, M. Wegkamp","doi":"10.1214/23-aos2289","DOIUrl":"https://doi.org/10.1214/23-aos2289","url":null,"abstract":"In high-dimensional classification problems, a commonly used approach is to first project the high-dimensional features into a lower dimensional space, and base the classification on the resulting lower dimensional projections. In this paper, we formulate a latent-variable model with a hidden low-dimensional structure to justify this two-step procedure and to guide which projection to choose. We propose a computationally efficient classifier that takes certain principal components (PCs) of the observed features as projections, with the number of retained PCs selected in a data-driven way. A general theory is established for analyzing such two-step classifiers based on any projections. We derive explicit rates of convergence of the excess risk of the proposed PC-based classifier. The obtained rates are further shown to be optimal up to logarithmic factors in the minimax sense. Our theory allows the lower-dimension to grow with the sample size and is also valid even when the feature dimension (greatly) exceeds the sample size. Extensive simulations corroborate our theoretical findings. The proposed method also performs favorably relative to other existing discriminant methods on three real data examples.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"370 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80450430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction note: “Asymptotic spectral theory for nonlinear time series”","authors":"Y. Zhang, X. Shao, Weibiao Wu","doi":"10.1214/22-aos2206","DOIUrl":"https://doi.org/10.1214/22-aos2206","url":null,"abstract":"","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"97 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73840781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuming He, Xiaoou Pan, Kean Ming Tan, Wen-Xin Zhou
Censored quantile regression (CQR) has become a valuable tool to study the heterogeneous association between a possibly censored outcome and a set of covariates, yet computation and statistical inference for CQR have remained a challenge for large-scale data with many covariates. In this paper, we focus on a smoothed martingale-based sequential estimating equations approach, to which scalable gradient-based algorithms can be applied. Theoretically, we provide a unified analysis of the smoothed sequential estimator and its penalized counterpart in increasing dimensions. When the covariate dimension grows with the sample size at a sublinear rate, we establish the uniform convergence rate (over a range of quantile indexes) and provide a rigorous justification for the validity of a multiplier bootstrap procedure for inference. In high-dimensional sparse settings, our results considerably improve the existing work on CQR by relaxing an exponential term of sparsity. We also demonstrate the advantage of the smoothed CQR over existing methods with both simulated experiments and data applications.
{"title":"Scalable estimation and inference for censored quantile regression process","authors":"Xuming He, Xiaoou Pan, Kean Ming Tan, Wen-Xin Zhou","doi":"10.1214/22-aos2214","DOIUrl":"https://doi.org/10.1214/22-aos2214","url":null,"abstract":"Censored quantile regression (CQR) has become a valuable tool to study the heterogeneous association between a possibly censored outcome and a set of covariates, yet computation and statistical inference for CQR have remained a challenge for large-scale data with many covariates. In this paper, we focus on a smoothed martingale-based sequential estimating equations approach, to which scalable gradient-based algorithms can be applied. Theoretically, we provide a unified analysis of the smoothed sequential estimator and its penalized counterpart in increasing dimensions. When the covariate dimension grows with the sample size at a sublinear rate, we establish the uniform convergence rate (over a range of quantile indexes) and provide a rigorous justification for the validity of a multiplier bootstrap procedure for inference. In high-dimensional sparse settings, our results considerably improve the existing work on CQR by relaxing an exponential term of sparsity. We also demonstrate the advantage of the smoothed CQR over existing methods with both simulated experiments and data applications.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"96 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85860062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper develops a foundation of methodology and theory for nonparametric regression with Lie group-valued predictors contaminated by measurement errors. Our methodology and theory are based on harmonic analysis on Lie groups, which is largely unknown in statistics. We establish a novel deconvolution regression estimator, and study its rate of convergence and asymptotic distribution. We also provide asymptotic confidence intervals based on the asymptotic distribution of the estimator and on the empirical likelihood technique. Several theoretical properties are also studied for a deconvolution density estimator, which is necessary to construct our regression estimator. The case of unknown measurement error distribution is also cov-ered. We present practical details on implementation as well as the results of simulation studies for several Lie groups. A real data example is also provided.
{"title":"Nonparametric regression on Lie groups with measurement errors","authors":"Jeong Min Jeon, B. Park, I. Van Keilegom","doi":"10.1214/22-aos2218","DOIUrl":"https://doi.org/10.1214/22-aos2218","url":null,"abstract":"This paper develops a foundation of methodology and theory for nonparametric regression with Lie group-valued predictors contaminated by measurement errors. Our methodology and theory are based on harmonic analysis on Lie groups, which is largely unknown in statistics. We establish a novel deconvolution regression estimator, and study its rate of convergence and asymptotic distribution. We also provide asymptotic confidence intervals based on the asymptotic distribution of the estimator and on the empirical likelihood technique. Several theoretical properties are also studied for a deconvolution density estimator, which is necessary to construct our regression estimator. The case of unknown measurement error distribution is also cov-ered. We present practical details on implementation as well as the results of simulation studies for several Lie groups. A real data example is also provided.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77654455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the fundamental problem of estimating the location of a d -variate probability measure under an L p loss function. The naive estimator, that minimizes the usual empirical L p risk, has a known asymptotic behavior but suffers from several deficiencies for p (cid:2)= 2, the most important one being the lack of equivariance under general affine transformations. In this work, we introduce a collection of L p location estimators ˆ μ p,(cid:2)n that minimize the size of suitable (cid:2) -dimensional data-based simplices. For (cid:2) = 1, these estimators reduce to the naive ones, whereas, for (cid:2) = d , they are equivariant under affine transformations. Irrespective of (cid:2) , these estimators reduce to the sample mean for p = 2, whereas for p = 1, the estimators provide the well-known spatial median and Oja median for (cid:2) = 1 and (cid:2) = d , respectively. Under very mild assumptions, we derive an explicit Bahadur representation result for ˆ μ p,(cid:2)n and establish asymptotic normality. We prove that, quite remarkably, the asymptotic behavior of the estimators does not depend on (cid:2) under spherical symmetry, so that the affine equivariance for (cid:2) = d is achieved at no cost in terms of efficiency. To allow for large sample size n and/or large dimension d , we introduce a version of our estimators relying on incomplete U-statistics. Under a centro-symmetry assumption, we also define companion tests φ p,(cid:2)n for the problem of testing the null hypothesis that the location μ of the underlying probability measure coincides with a given location μ 0 . For any p , affine invariance is achieved for (cid:2) = d . For any (cid:2) and p , we derive explicit expressions for the asymptotic power of these tests under contiguous local alternatives, which reveals that asymptotic relative efficiencies with respect to traditional parametric Gaussian procedures for hypothesis testing coincide with those obtained for point estimation. We illustrate finite-sample relevance of our asymptotic results through Monte Carlo exercises and also treat a real data example.
{"title":"Affine-equivariant inference for multivariate location under Lp loss functions","authors":"A. Dürre, D. Paindaveine","doi":"10.1214/22-aos2199","DOIUrl":"https://doi.org/10.1214/22-aos2199","url":null,"abstract":"We consider the fundamental problem of estimating the location of a d -variate probability measure under an L p loss function. The naive estimator, that minimizes the usual empirical L p risk, has a known asymptotic behavior but suffers from several deficiencies for p (cid:2)= 2, the most important one being the lack of equivariance under general affine transformations. In this work, we introduce a collection of L p location estimators ˆ μ p,(cid:2)n that minimize the size of suitable (cid:2) -dimensional data-based simplices. For (cid:2) = 1, these estimators reduce to the naive ones, whereas, for (cid:2) = d , they are equivariant under affine transformations. Irrespective of (cid:2) , these estimators reduce to the sample mean for p = 2, whereas for p = 1, the estimators provide the well-known spatial median and Oja median for (cid:2) = 1 and (cid:2) = d , respectively. Under very mild assumptions, we derive an explicit Bahadur representation result for ˆ μ p,(cid:2)n and establish asymptotic normality. We prove that, quite remarkably, the asymptotic behavior of the estimators does not depend on (cid:2) under spherical symmetry, so that the affine equivariance for (cid:2) = d is achieved at no cost in terms of efficiency. To allow for large sample size n and/or large dimension d , we introduce a version of our estimators relying on incomplete U-statistics. Under a centro-symmetry assumption, we also define companion tests φ p,(cid:2)n for the problem of testing the null hypothesis that the location μ of the underlying probability measure coincides with a given location μ 0 . For any p , affine invariance is achieved for (cid:2) = d . For any (cid:2) and p , we derive explicit expressions for the asymptotic power of these tests under contiguous local alternatives, which reveals that asymptotic relative efficiencies with respect to traditional parametric Gaussian procedures for hypothesis testing coincide with those obtained for point estimation. We illustrate finite-sample relevance of our asymptotic results through Monte Carlo exercises and also treat a real data example.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"73 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86092882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose new estimation methods for time series models, possibly non-causal and/or non-invertible, using serial dependence information from the characteristic function of model residuals. This allows to impose the iid or martingale difference assumptions on the model errors to identify the unknown location of the roots of the lag polynomials for ARMA models without resorting to higher order moments or distributional assumptions. We consider generalized spectral density and cumulative distribution functions to measure residuals dependence at an increasing number of lags under both assumptions and discuss robust inference to higher order dependence when only mean independence is imposed on model errors. We study the consistency and asymptotic distribution of parameter estimates and discuss efficiency when different restrictions on error dependence are used simultaneously, including serial uncorrelation. Optimal weighting of continuous moment conditions yields maximum likelihood efficiency under independence for unknown error distribution. We investigate numerical implementation and finite sample properties of the new classes of estimates. distributional assumptions on model errors, Gaussian Pseudo Maximum Likelihood (PML) estimates based on least squares are typically prescribed. The Gaussian PML estimates try in fact to match data sample autocovariances with the model implied ones, or equivalently, minimize the magnitude of residuals autocorrelations to match the zero serial correlation white noise assumption, which only under Gaussianity is equivalent to serial independence. Conditional moments based models lead to unconditional moment restrictions using the uncorrelation of errors with past information described by instrumental variables (see e.g. the survey by Ana-tolyev, 2007). These instruments are constructed with lags of observations and/or residuals, though these alternative representations of past information are not equivalent in general, for instance, when the true model is non-invertible.
{"title":"Estimation of time series models using residuals dependence measures","authors":"C. Velasco","doi":"10.1214/22-aos2220","DOIUrl":"https://doi.org/10.1214/22-aos2220","url":null,"abstract":"We propose new estimation methods for time series models, possibly non-causal and/or non-invertible, using serial dependence information from the characteristic function of model residuals. This allows to impose the iid or martingale difference assumptions on the model errors to identify the unknown location of the roots of the lag polynomials for ARMA models without resorting to higher order moments or distributional assumptions. We consider generalized spectral density and cumulative distribution functions to measure residuals dependence at an increasing number of lags under both assumptions and discuss robust inference to higher order dependence when only mean independence is imposed on model errors. We study the consistency and asymptotic distribution of parameter estimates and discuss efficiency when different restrictions on error dependence are used simultaneously, including serial uncorrelation. Optimal weighting of continuous moment conditions yields maximum likelihood efficiency under independence for unknown error distribution. We investigate numerical implementation and finite sample properties of the new classes of estimates. distributional assumptions on model errors, Gaussian Pseudo Maximum Likelihood (PML) estimates based on least squares are typically prescribed. The Gaussian PML estimates try in fact to match data sample autocovariances with the model implied ones, or equivalently, minimize the magnitude of residuals autocorrelations to match the zero serial correlation white noise assumption, which only under Gaussianity is equivalent to serial independence. Conditional moments based models lead to unconditional moment restrictions using the uncorrelation of errors with past information described by instrumental variables (see e.g. the survey by Ana-tolyev, 2007). These instruments are constructed with lags of observations and/or residuals, though these alternative representations of past information are not equivalent in general, for instance, when the true model is non-invertible.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90992273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is well known that estimation of a bivariate cumulative distribution function of a pair of right censored lifetimes presents challenges unparalleled to the univariate case where a product-limit Kaplan-Meyer’s methodology typically yields optimal estimation, and the literature on optimal estimation of the joint probability density is next to none. The paper, for the first time in the survival analysis literature, develops the theory and methodology of sharp minimax and adaptive nonparametric estimation of the joint density under the mean integrated squared error (MISE) criterion. The theory shows how an underlying joint density, together with the bivariate distribution of censoring variables, affect the estimation, and what and how may or may not be estimated in the presence of censoring. Practical example illustrates the problem.
{"title":"Nonparametric bivariate density estimation for censored lifetimes","authors":"S. Efromovich","doi":"10.1214/22-aos2209","DOIUrl":"https://doi.org/10.1214/22-aos2209","url":null,"abstract":"It is well known that estimation of a bivariate cumulative distribution function of a pair of right censored lifetimes presents challenges unparalleled to the univariate case where a product-limit Kaplan-Meyer’s methodology typically yields optimal estimation, and the literature on optimal estimation of the joint probability density is next to none. The paper, for the first time in the survival analysis literature, develops the theory and methodology of sharp minimax and adaptive nonparametric estimation of the joint density under the mean integrated squared error (MISE) criterion. The theory shows how an underlying joint density, together with the bivariate distribution of censoring variables, affect the estimation, and what and how may or may not be estimated in the presence of censoring. Practical example illustrates the problem.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79027463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approximate kernel PCA: Computational versus statistical trade-off","authors":"Bharath K. Sriperumbudur, Nicholas Sterge","doi":"10.1214/22-aos2204","DOIUrl":"https://doi.org/10.1214/22-aos2204","url":null,"abstract":"","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"66 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76512264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}