Variance estimation is important for statistical inference. It becomes nontrivial when observations are masked by serial dependence structures and time-varying mean structures. Existing methods either ignore or sub-optimally handle these nuisance structures. This paper develops a general framework for the estimation of the long-run variance for time series with nonconstant means. The building blocks are difference statistics. The proposed class of estimators is general enough to cover many existing estimators. Necessary and sufficient conditions for consistency are investigated. The first asymptotically optimal estimator is derived. Our proposed estimator is the-oretically proven to be invariant to arbitrary mean structures, which may include trends and a possibly divergent number of discontinuities.
{"title":"Optimal difference-based variance estimators in time series: A general framework","authors":"Kin Wai Chan","doi":"10.1214/21-aos2154","DOIUrl":"https://doi.org/10.1214/21-aos2154","url":null,"abstract":"Variance estimation is important for statistical inference. It becomes nontrivial when observations are masked by serial dependence structures and time-varying mean structures. Existing methods either ignore or sub-optimally handle these nuisance structures. This paper develops a general framework for the estimation of the long-run variance for time series with nonconstant means. The building blocks are difference statistics. The proposed class of estimators is general enough to cover many existing estimators. Necessary and sufficient conditions for consistency are investigated. The first asymptotically optimal estimator is derived. Our proposed estimator is the-oretically proven to be invariant to arbitrary mean structures, which may include trends and a possibly divergent number of discontinuities.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84555122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Positive dependence is present in many real world data sets and has appealing stochastic properties that can be exploited in statistical modeling and in estimation. In particular, the notion of multivariate total positivity of order 2 ($ mathrm{MTP}_{2} $) is a convex constraint and acts as an implicit regularizer in the Gaussian case. We study positive dependence in multivariate extremes and introduce $ mathrm{EMTP}_{2} $, an extremal version of $ mathrm{MTP}_{2} $. This notion turns out to appear prominently in extremes, and in fact, it is satisfied by many classical models. For a H"usler--Reiss distribution, the analogue of a Gaussian distribution in extremes, we show that it is $ mathrm{EMTP}_{2} $ if and only if its precision matrix is a Laplacian of a connected graph. We propose an estimator for the parameters of the H"usler--Reiss distribution under $ mathrm{EMTP}_{2} $ as the solution of a convex optimization problem with Laplacian constraint. We prove that this estimator is consistent and typically yields a sparse model with possibly nondecomposable extremal graphical structure. Applying our methods to a data set of Danube River flows, we illustrate this regularization and the superior performance compared to existing methods.
{"title":"Total positivity in multivariate extremes","authors":"Frank Rottger, Sebastian Engelke, Piotr Zwiernik","doi":"10.1214/23-aos2272","DOIUrl":"https://doi.org/10.1214/23-aos2272","url":null,"abstract":"Positive dependence is present in many real world data sets and has appealing stochastic properties that can be exploited in statistical modeling and in estimation. In particular, the notion of multivariate total positivity of order 2 ($ mathrm{MTP}_{2} $) is a convex constraint and acts as an implicit regularizer in the Gaussian case. We study positive dependence in multivariate extremes and introduce $ mathrm{EMTP}_{2} $, an extremal version of $ mathrm{MTP}_{2} $. This notion turns out to appear prominently in extremes, and in fact, it is satisfied by many classical models. For a H\"usler--Reiss distribution, the analogue of a Gaussian distribution in extremes, we show that it is $ mathrm{EMTP}_{2} $ if and only if its precision matrix is a Laplacian of a connected graph. We propose an estimator for the parameters of the H\"usler--Reiss distribution under $ mathrm{EMTP}_{2} $ as the solution of a convex optimization problem with Laplacian constraint. We prove that this estimator is consistent and typically yields a sparse model with possibly nondecomposable extremal graphical structure. Applying our methods to a data set of Danube River flows, we illustrate this regularization and the superior performance compared to existing methods.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82875264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ilmun Kim, Matey Neykov, Sivaraman Balakrishnan, L. Wasserman
In this paper, we investigate local permutation tests for testing conditional independence between two random vectors X and Y given Z . The local permutation test determines the significance of a test statistic by locally shuffling samples which share similar values of the conditioning variables Z , and it forms a natural extension of the usual permutation approach for unconditional independence testing. Despite its simplicity and empirical support, the theoretical underpinnings of the local permutation test remain unclear. Motivated by this gap, this paper aims to establish theoretical foundations of local permutation tests with a particular focus on binning-based statistics. We start by revisiting the hardness of conditional independence testing and provide an upper bound for the power of any valid conditional independence test, which holds when the probability of observing “collisions” in Z is small. This negative result naturally motivates us to impose additional restrictions on the possible distributions under the null and alternate. To this end, we focus our attention on certain classes of smooth distributions and identify provably tight conditions under which the local permutation method is universally valid, i.e. it is valid when applied to any (binning-based) test statistic. To complement this result on type I error control, we also show that in some cases, a binning-based statistic calibrated via the local permutation method can achieve minimax optimal power. We also introduce a double-binning permutation strategy, which yields a valid test over less smooth null distributions than the typical single-binning method without compromising much power. Finally, we present simulation results to support our theoretical
{"title":"Local permutation tests for conditional independence","authors":"Ilmun Kim, Matey Neykov, Sivaraman Balakrishnan, L. Wasserman","doi":"10.1214/22-aos2233","DOIUrl":"https://doi.org/10.1214/22-aos2233","url":null,"abstract":"In this paper, we investigate local permutation tests for testing conditional independence between two random vectors X and Y given Z . The local permutation test determines the significance of a test statistic by locally shuffling samples which share similar values of the conditioning variables Z , and it forms a natural extension of the usual permutation approach for unconditional independence testing. Despite its simplicity and empirical support, the theoretical underpinnings of the local permutation test remain unclear. Motivated by this gap, this paper aims to establish theoretical foundations of local permutation tests with a particular focus on binning-based statistics. We start by revisiting the hardness of conditional independence testing and provide an upper bound for the power of any valid conditional independence test, which holds when the probability of observing “collisions” in Z is small. This negative result naturally motivates us to impose additional restrictions on the possible distributions under the null and alternate. To this end, we focus our attention on certain classes of smooth distributions and identify provably tight conditions under which the local permutation method is universally valid, i.e. it is valid when applied to any (binning-based) test statistic. To complement this result on type I error control, we also show that in some cases, a binning-based statistic calibrated via the local permutation method can achieve minimax optimal power. We also introduce a double-binning permutation strategy, which yields a valid test over less smooth null distributions than the typical single-binning method without compromising much power. Finally, we present simulation results to support our theoretical","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85565845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuichi Goto, Tobias Kley, Ria Van Hecke, S. Volgushev, H. Dette, M. Hallin
Frequency domain methods form a ubiquitous part of the statistical tool-box for time series analysis. In recent years, considerable interest has been given to the development of new spectral methodology and tools capturing dynamics in the entire joint distributions and thus avoiding the limitations of classical, L 2 -based spectral methods. Most of the spectral concepts proposed in that literature suffer from one major drawback, though: their estimation re-quires the choice of a smoothing parameter, which has a considerable impact on estimation quality and poses challenges for statistical inference. In this paper, associated with the concept of copula-based spectrum, we introduce the notion of copula spectral distribution function or integrated copula spectrum . This integrated copula spectrum retains the advantages of copula-based spectra but can be estimated without the need for smoothing parameters. We provide such estimators, along with a thorough theoretical analysis, based on a functional central limit theorem, of their asymptotic properties. We leverage these results to test various hypotheses that cannot be addressed by classical spectral methods, such as the lack of time-reversibility or asymmetry in tail dynamics.
{"title":"The integrated copula spectrum","authors":"Yuichi Goto, Tobias Kley, Ria Van Hecke, S. Volgushev, H. Dette, M. Hallin","doi":"10.1214/22-AOS2240","DOIUrl":"https://doi.org/10.1214/22-AOS2240","url":null,"abstract":"Frequency domain methods form a ubiquitous part of the statistical tool-box for time series analysis. In recent years, considerable interest has been given to the development of new spectral methodology and tools capturing dynamics in the entire joint distributions and thus avoiding the limitations of classical, L 2 -based spectral methods. Most of the spectral concepts proposed in that literature suffer from one major drawback, though: their estimation re-quires the choice of a smoothing parameter, which has a considerable impact on estimation quality and poses challenges for statistical inference. In this paper, associated with the concept of copula-based spectrum, we introduce the notion of copula spectral distribution function or integrated copula spectrum . This integrated copula spectrum retains the advantages of copula-based spectra but can be estimated without the need for smoothing parameters. We provide such estimators, along with a thorough theoretical analysis, based on a functional central limit theorem, of their asymptotic properties. We leverage these results to test various hypotheses that cannot be addressed by classical spectral methods, such as the lack of time-reversibility or asymmetry in tail dynamics.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76479775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modern statistical applications often involve minimizing an objective function that may be nonsmooth and/or nonconvex. This paper focuses on a broad Bregman-surrogate algorithm framework including the local linear approximation, mirror descent, iterative thresholding, DC programming and many others as particular instances. The re-characterization via generalized Bregman functions enables us to construct suitable error measures and establish global convergence rates for nonconvex and nonsmooth objectives in possibly high dimensions. For sparse learning problems with a composite objective, under some regularity conditions, the obtained estimators as the surrogate’s fixed points, though not necessarily local minimizers, enjoy provable statistical guarantees, and the sequence of iterates can be shown to approach the statistical truth within the desired accuracy geometrically fast. The paper also studies how to design adaptive momentum based accelerations without assuming convexity or smoothness by carefully controlling stepsize and relaxation parameters.
{"title":"Analysis of generalized Bregman surrogate algorithms for nonsmooth nonconvex statistical learning","authors":"Yiyuan She, Zhifeng Wang, Jiuwu Jin","doi":"10.1214/21-aos2090","DOIUrl":"https://doi.org/10.1214/21-aos2090","url":null,"abstract":"Modern statistical applications often involve minimizing an objective function that may be nonsmooth and/or nonconvex. This paper focuses on a broad Bregman-surrogate algorithm framework including the local linear approximation, mirror descent, iterative thresholding, DC programming and many others as particular instances. The re-characterization via generalized Bregman functions enables us to construct suitable error measures and establish global convergence rates for nonconvex and nonsmooth objectives in possibly high dimensions. For sparse learning problems with a composite objective, under some regularity conditions, the obtained estimators as the surrogate’s fixed points, though not necessarily local minimizers, enjoy provable statistical guarantees, and the sequence of iterates can be shown to approach the statistical truth within the desired accuracy geometrically fast. The paper also studies how to design adaptive momentum based accelerations without assuming convexity or smoothness by carefully controlling stepsize and relaxation parameters.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"509 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76404726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On fixed-domain asymptotics, parameter estimation and isotropic Gaussian random fields with Matérn covariance functions","authors":"Wei-Liem Loh, Saifei Sun, J. Wen","doi":"10.1214/21-aos2077","DOIUrl":"https://doi.org/10.1214/21-aos2077","url":null,"abstract":"","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82034007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manski’s celebrated maximum score estimator for the discrete choice model, which is an optimal linear discriminator, has been the focus of much investigation in both the econometrics and statistics literatures, but its behavior under growing dimension scenarios largely remains unknown. This paper addresses that gap. Two different cases are considered: p grows with n but at a slow rate, i.e. p/n→ 0; and p n (fast growth). In the binary response model, we recast Manski’s score estimation as empirical risk minimization for a classification problem, and derive the `2 rate of convergence of the score estimator under a new transition condition in terms of a margin parameter that calibrates the level of difficulty of the estimation problem. We also establish upper and lower bounds for the minimax `2 error in the binary choice model that differ by a logarithmic factor, and construct a minimax-optimal estimator in the slow growth regime. Some extensions to the multinomial choice model are also considered.
{"title":"Optimal linear discriminators for the discrete choice model in growing dimensions","authors":"Debarghya Mukherjee, M. Banerjee","doi":"10.1214/21-aos2085","DOIUrl":"https://doi.org/10.1214/21-aos2085","url":null,"abstract":"Manski’s celebrated maximum score estimator for the discrete choice model, which is an optimal linear discriminator, has been the focus of much investigation in both the econometrics and statistics literatures, but its behavior under growing dimension scenarios largely remains unknown. This paper addresses that gap. Two different cases are considered: p grows with n but at a slow rate, i.e. p/n→ 0; and p n (fast growth). In the binary response model, we recast Manski’s score estimation as empirical risk minimization for a classification problem, and derive the `2 rate of convergence of the score estimator under a new transition condition in terms of a margin parameter that calibrates the level of difficulty of the estimation problem. We also establish upper and lower bounds for the minimax `2 error in the binary choice model that differ by a logarithmic factor, and construct a minimax-optimal estimator in the slow growth regime. Some extensions to the multinomial choice model are also considered.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80166810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Johannes Schmidt-Hieber, Laura Fee Schneider, Thomas Staudt, A. Krajina, Timo Aspelmeier, Axel Munk
Estimation of the population size n from k i.i
从k。i。i估计总体大小n
{"title":"Posterior analysis of n in the binomial (n,p) problem with both parameters unknown—with applications to quantitative nanoscopy","authors":"Johannes Schmidt-Hieber, Laura Fee Schneider, Thomas Staudt, A. Krajina, Timo Aspelmeier, Axel Munk","doi":"10.1214/21-aos2096","DOIUrl":"https://doi.org/10.1214/21-aos2096","url":null,"abstract":"Estimation of the population size n from k i.i","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80336383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider a potential outcomes model in which interference may be present between any two units but the extent of interference diminishes with spatial distance. The causal estimand is the global average treatment effect, which compares counterfactual outcomes when all units are treated to those when none are. We study a class of designs in which space is partitioned into clusters that are randomized into treatment and control. For each design, we estimate the treatment effect using a Horvitz-Thompson estimator that compares the average outcomes of units with all neighbors treated to units with no treated neighbors, where the neighborhood radius is of the same order as the cluster size dictated by the design. We derive the estimator’s rate of convergence as a function of the design and degree of interference and use this to obtain estimator-design pairs that achieve near-optimal rates of convergence under relatively minimal assumptions on interference. We prove that the estimators are asymptotically normal and provide a variance estimator. For practical implementation of the designs, we suggest partitioning space using clustering algorithms. only be directly observed in the data under an extreme design that assigns all units to the same treatment arm, which would necessarily preclude observation of the other counterfactual. Common designs used in the literature, including those studied here, assign different units to different treatment arms, so neither average is directly observed in the data. Nonetheless, we show that asymptotic inference on θ n is possible for a class of cluster-randomized designs under spatial interference where the degree of interference diminishes with distance.
{"title":"Rate-optimal cluster-randomized designs for spatial interference","authors":"Michael P. Leung","doi":"10.1214/22-aos2224","DOIUrl":"https://doi.org/10.1214/22-aos2224","url":null,"abstract":"We consider a potential outcomes model in which interference may be present between any two units but the extent of interference diminishes with spatial distance. The causal estimand is the global average treatment effect, which compares counterfactual outcomes when all units are treated to those when none are. We study a class of designs in which space is partitioned into clusters that are randomized into treatment and control. For each design, we estimate the treatment effect using a Horvitz-Thompson estimator that compares the average outcomes of units with all neighbors treated to units with no treated neighbors, where the neighborhood radius is of the same order as the cluster size dictated by the design. We derive the estimator’s rate of convergence as a function of the design and degree of interference and use this to obtain estimator-design pairs that achieve near-optimal rates of convergence under relatively minimal assumptions on interference. We prove that the estimators are asymptotically normal and provide a variance estimator. For practical implementation of the designs, we suggest partitioning space using clustering algorithms. only be directly observed in the data under an extreme design that assigns all units to the same treatment arm, which would necessarily preclude observation of the other counterfactual. Common designs used in the literature, including those studied here, assign different units to different treatment arms, so neither average is directly observed in the data. Nonetheless, we show that asymptotic inference on θ n is possible for a class of cluster-randomized designs under spatial interference where the degree of interference diminishes with distance.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84708981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modeling univariate block maxima by the generalized extreme value distribution constitutes one of the most widely applied approaches in extreme value statistics. It has recently been found that, for an underlying stationary time series, respective estimators may be improved by calculating block maxima in an overlapping way. A proof of concept is provided that the latter finding also holds in situations that involve certain piecewise stationarities. A weak convergence result for an empirical process of central interest is provided, and, as a case-in-point, further details are worked out explicitly for the probability weighted moment estimator. Irrespective of the serial dependence, the estimation variance is shown to be smaller for the new estimator, while the bias was found to be the same or vary comparably little in extensive simulation experiments. The results are illustrated by Monte Carlo simulation experiments and are applied to a common situation involving temperature extremes in a changing climate.
{"title":"On the disjoint and sliding block maxima method for piecewise stationary time series","authors":"Axel Bucher, L. Zanger","doi":"10.1214/23-aos2260","DOIUrl":"https://doi.org/10.1214/23-aos2260","url":null,"abstract":"Modeling univariate block maxima by the generalized extreme value distribution constitutes one of the most widely applied approaches in extreme value statistics. It has recently been found that, for an underlying stationary time series, respective estimators may be improved by calculating block maxima in an overlapping way. A proof of concept is provided that the latter finding also holds in situations that involve certain piecewise stationarities. A weak convergence result for an empirical process of central interest is provided, and, as a case-in-point, further details are worked out explicitly for the probability weighted moment estimator. Irrespective of the serial dependence, the estimation variance is shown to be smaller for the new estimator, while the bias was found to be the same or vary comparably little in extensive simulation experiments. The results are illustrated by Monte Carlo simulation experiments and are applied to a common situation involving temperature extremes in a changing climate.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"90 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79403685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}