The recently introduced framework of universal inference provides a new approach to constructing hypothesis tests and confidence regions that are valid in finite samples and do not rely on any specific regularity assumptions on the underlying statistical model. At the core of the methodology is a split likelihood ratio statistic, which is formed under data splitting and compared to a cleverly selected universal critical value. As this critical value can be very conservative, it is interesting to mitigate the potential loss of power by careful choice of the ratio according to which data are split. Motivated by this problem, we study the split likelihood ratio test under local alternatives and introduce the resulting class of noncentral split chi-square distributions. We investigate the properties of this new class of distributions and use it to numerically examine and propose an optimal choice of the data splitting ratio for tests of composite hypotheses of different dimensions.
{"title":"On the choice of the splitting ratio for the split likelihood ratio test","authors":"David Strieder, M. Drton","doi":"10.1214/22-ejs2099","DOIUrl":"https://doi.org/10.1214/22-ejs2099","url":null,"abstract":"The recently introduced framework of universal inference provides a new approach to constructing hypothesis tests and confidence regions that are valid in finite samples and do not rely on any specific regularity assumptions on the underlying statistical model. At the core of the methodology is a split likelihood ratio statistic, which is formed under data splitting and compared to a cleverly selected universal critical value. As this critical value can be very conservative, it is interesting to mitigate the potential loss of power by careful choice of the ratio according to which data are split. Motivated by this problem, we study the split likelihood ratio test under local alternatives and introduce the resulting class of noncentral split chi-square distributions. We investigate the properties of this new class of distributions and use it to numerically examine and propose an optimal choice of the data splitting ratio for tests of composite hypotheses of different dimensions.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48549640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalized maximum likelihood estimation of the mean of parameters of mixtures. With applications to sampling and to observational studies","authors":"E. Greenshtein, Ya'acov Ritov","doi":"10.1214/22-ejs2082","DOIUrl":"https://doi.org/10.1214/22-ejs2082","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49008194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider an analysis of variance type problem, where the sample observations are random elements in an infinite dimensional space. This scenario covers the case, where the observations are random functions. For such a problem, we propose a test based on spatial signs. We develop an asymptotic implementation as well as a bootstrap implementation and a permutation implementation of this test and investigate their size and power properties. We compare the performance of our test with that of several mean based tests of analysis of variance for functional data studied in the literature. Interestingly, our test not only outperforms the mean based tests in several non-Gaussian models with heavy tails or skewed distributions, but in some Gaussian models also. Further, we also compare the performance of our test with the mean based tests in several models involving contaminated probability distributions. Finally, we demonstrate the performance of these tests in three real datasets: a Canadian weather dataset, a spectrometric dataset on chemical analysis of meat samples and a dataset on orthotic measurements on volunteers.
{"title":"Multi-sample comparison using spatial signs for infinite dimensional data","authors":"Joydeep Chowdhury, P. Chaudhuri","doi":"10.1214/22-ejs2054","DOIUrl":"https://doi.org/10.1214/22-ejs2054","url":null,"abstract":"We consider an analysis of variance type problem, where the sample observations are random elements in an infinite dimensional space. This scenario covers the case, where the observations are random functions. For such a problem, we propose a test based on spatial signs. We develop an asymptotic implementation as well as a bootstrap implementation and a permutation implementation of this test and investigate their size and power properties. We compare the performance of our test with that of several mean based tests of analysis of variance for functional data studied in the literature. Interestingly, our test not only outperforms the mean based tests in several non-Gaussian models with heavy tails or skewed distributions, but in some Gaussian models also. Further, we also compare the performance of our test with the mean based tests in several models involving contaminated probability distributions. Finally, we demonstrate the performance of these tests in three real datasets: a Canadian weather dataset, a spectrometric dataset on chemical analysis of meat samples and a dataset on orthotic measurements on volunteers.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47773713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract: We consider truncated (or conditional) sum-of-squares estimation of a parametric fractional time series model with an additive deterministic structure. The latter consists of both a drift term and a generalized power law trend. The memory parameter of the stochastic component and the power parameter of the deterministic trend component are both considered unknown real numbers to be estimated and belonging to arbitrarily large compact sets. Thus, our model captures different forms of nonstationarity and noninvertibility as well as a very flexible deterministic specification. As in related settings, the proof of consistency (which is a prerequisite for proving asymptotic normality) is challenging due to non-uniform convergence of the objective function over a large admissible parameter space and due to the competition between stochastic and deterministic components. As expected, parameter estimates related to the deterministic component are shown to be consistent and asymptotically normal only for parts of the parameter space depending on the relative strength of the stochastic and deterministic components. In contrast, we establish consistency and asymptotic normality of parameter estimates related to the stochastic component for the entire parameter space. Furthermore, the asymptotic distribution of the latter estimates is unaffected by the presence of the deterministic component, even when this is not consistently estimable. We also include Monte Carlo simulations to illustrate our results.
{"title":"Truncated sum-of-squares estimation of fractional time series models with generalized power law trend","authors":"J. Hualde, M. Nielsen","doi":"10.1214/22-ejs2009","DOIUrl":"https://doi.org/10.1214/22-ejs2009","url":null,"abstract":"Abstract: We consider truncated (or conditional) sum-of-squares estimation of a parametric fractional time series model with an additive deterministic structure. The latter consists of both a drift term and a generalized power law trend. The memory parameter of the stochastic component and the power parameter of the deterministic trend component are both considered unknown real numbers to be estimated and belonging to arbitrarily large compact sets. Thus, our model captures different forms of nonstationarity and noninvertibility as well as a very flexible deterministic specification. As in related settings, the proof of consistency (which is a prerequisite for proving asymptotic normality) is challenging due to non-uniform convergence of the objective function over a large admissible parameter space and due to the competition between stochastic and deterministic components. As expected, parameter estimates related to the deterministic component are shown to be consistent and asymptotically normal only for parts of the parameter space depending on the relative strength of the stochastic and deterministic components. In contrast, we establish consistency and asymptotic normality of parameter estimates related to the stochastic component for the entire parameter space. Furthermore, the asymptotic distribution of the latter estimates is unaffected by the presence of the deterministic component, even when this is not consistently estimable. We also include Monte Carlo simulations to illustrate our results.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45592747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract: We revisit the problem of designing an efficient binary classifier in a challenging high-dimensional framework. The model under study assumes some local dependence structure among feature variables represented by a block-diagonal covariance matrix with a growing number of blocks of an arbitrary, but fixed size. The blocks correspond to non-overlapping independent groups of strongly correlated features. To assess the relevance of a particular block in predicting the response, we introduce a measure of “signal strength” pertaining to each feature block. This measure is then used to specify a sparse model of our interest. We further propose a threshold-based feature selector which operates as a screen-and-clean scheme integrated into a linear classifier: the data is subject to screening and hard threshold cleaning to filter out the blocks that contain no signals. Asymptotic properties of the proposed classifiers are studied when the sample size n depends on the number of feature blocks b, and the sample size goes to infinity with b at a slower rate than b. The new classifiers, which are fully adaptive to unknown parameters of the model, are shown to perform asymptotically optimally in a large part of the classification region. The numerical study confirms good analytical properties of the new classifiers that compare favorably to the existing threshold-based procedure used in a similar context.
{"title":"Adaptive threshold-based classification of sparse high-dimensional data","authors":"T. Pavlenko, N. Stepanova, Lee Thompson","doi":"10.1214/22-ejs1998","DOIUrl":"https://doi.org/10.1214/22-ejs1998","url":null,"abstract":"Abstract: We revisit the problem of designing an efficient binary classifier in a challenging high-dimensional framework. The model under study assumes some local dependence structure among feature variables represented by a block-diagonal covariance matrix with a growing number of blocks of an arbitrary, but fixed size. The blocks correspond to non-overlapping independent groups of strongly correlated features. To assess the relevance of a particular block in predicting the response, we introduce a measure of “signal strength” pertaining to each feature block. This measure is then used to specify a sparse model of our interest. We further propose a threshold-based feature selector which operates as a screen-and-clean scheme integrated into a linear classifier: the data is subject to screening and hard threshold cleaning to filter out the blocks that contain no signals. Asymptotic properties of the proposed classifiers are studied when the sample size n depends on the number of feature blocks b, and the sample size goes to infinity with b at a slower rate than b. The new classifiers, which are fully adaptive to unknown parameters of the model, are shown to perform asymptotically optimally in a large part of the classification region. The numerical study confirms good analytical properties of the new classifiers that compare favorably to the existing threshold-based procedure used in a similar context.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47611113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present new theoretical results on optimal estimation of certain random quantities based on high frequency observations of a Lévy process. More specifically, we investigate the asymptotic theory for the conditional mean and conditional median estimators of the supremum/infimum of a linear Brownian motion and a strictly stable Lévy process. Another contribution of our article is the conditional mean estimation of the local time and the occupation time of a linear Brownian motion. We demonstrate that the new estimators are considerably more efficient compared to the classical estimators studied in e.g. [6, 14, 29, 30, 38]. Furthermore, we discuss pre-estimation of the parameters of the underlying models, which is required for practical implementation of the proposed statistics. MSC2020 subject classifications: Primary 62M05, 62G20, 60F05; secondary 62G15, 60G18, 60G51.
{"title":"Optimal estimation of the supremum and occupation times of a self-similar Lévy process","authors":"J. Ivanovs, M. Podolskij","doi":"10.1214/21-ejs1928","DOIUrl":"https://doi.org/10.1214/21-ejs1928","url":null,"abstract":"In this paper we present new theoretical results on optimal estimation of certain random quantities based on high frequency observations of a Lévy process. More specifically, we investigate the asymptotic theory for the conditional mean and conditional median estimators of the supremum/infimum of a linear Brownian motion and a strictly stable Lévy process. Another contribution of our article is the conditional mean estimation of the local time and the occupation time of a linear Brownian motion. We demonstrate that the new estimators are considerably more efficient compared to the classical estimators studied in e.g. [6, 14, 29, 30, 38]. Furthermore, we discuss pre-estimation of the parameters of the underlying models, which is required for practical implementation of the proposed statistics. MSC2020 subject classifications: Primary 62M05, 62G20, 60F05; secondary 62G15, 60G18, 60G51.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43013455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sufficient Dimension Reduction (SDR) becomes an important tool for mitigating the curse of dimensionality in high dimensional regression analysis. Recently, Flexible SDR (FSDR) has been proposed to extend SDR by finding lower dimensional projections of transformed explanatory variables. The dimensions of the projections however cannot fully represent the extent of data reduction FSDR can achieve. As a consequence, optimality and other theoretical properties of FSDR are currently not well understood. In this article, we propose to use the σ-field associated with the projections, together with their dimensions to fully characterize FSDR, and refer to the σ-field as the FSDR σ-field. We further introduce the concept of minimal FSDR σ-field and consider FSDR projections with the minimal σfield optimal. Under some mild conditions, we show that the minimal FSDR σ-field exists, attaining the lowest dimensionality at the same time. To estimate the minimal FSDR σ-field, we propose a two-stage procedure called the Generalized Kernel Dimension Reduction (GKDR) method and partially establish its consistency property under weak conditions. Extensive simulation experiments demonstrate that the GKDRmethod can effectively find the minimal FSDR σ-field and outperform other existing methods. The application of GKDR to a real life air pollution data set sheds new light on the connections between atmospheric conditions and air quality. MSC2020 subject classifications: Primary 62B05; secondary 62J02.
{"title":"Minimal σ-field for flexible sufficient dimension reduction","authors":"Hanmin Guo, Lin Hou, Y. Zhu","doi":"10.1214/22-ejs1999","DOIUrl":"https://doi.org/10.1214/22-ejs1999","url":null,"abstract":"Sufficient Dimension Reduction (SDR) becomes an important tool for mitigating the curse of dimensionality in high dimensional regression analysis. Recently, Flexible SDR (FSDR) has been proposed to extend SDR by finding lower dimensional projections of transformed explanatory variables. The dimensions of the projections however cannot fully represent the extent of data reduction FSDR can achieve. As a consequence, optimality and other theoretical properties of FSDR are currently not well understood. In this article, we propose to use the σ-field associated with the projections, together with their dimensions to fully characterize FSDR, and refer to the σ-field as the FSDR σ-field. We further introduce the concept of minimal FSDR σ-field and consider FSDR projections with the minimal σfield optimal. Under some mild conditions, we show that the minimal FSDR σ-field exists, attaining the lowest dimensionality at the same time. To estimate the minimal FSDR σ-field, we propose a two-stage procedure called the Generalized Kernel Dimension Reduction (GKDR) method and partially establish its consistency property under weak conditions. Extensive simulation experiments demonstrate that the GKDRmethod can effectively find the minimal FSDR σ-field and outperform other existing methods. The application of GKDR to a real life air pollution data set sheds new light on the connections between atmospheric conditions and air quality. MSC2020 subject classifications: Primary 62B05; secondary 62J02.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46459761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimation of partially conditional average treatment effect by double kernel-covariate balancing","authors":"Jiayi Wang, R. K. Wong, Shu Yang, K. C. G. Chan","doi":"10.1214/22-ejs2000","DOIUrl":"https://doi.org/10.1214/22-ejs2000","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49422993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Isotonic regression for elicitable functionals and their Bayes risk","authors":"Anja Mühlemann, Johanna F. Ziegel","doi":"10.1214/22-ejs2034","DOIUrl":"https://doi.org/10.1214/22-ejs2034","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44276319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}