{"title":"Deep learning for the partially linear Cox model","authors":"Qixian Zhong, Jonas W. Mueller, Jane-ling Wang","doi":"10.1214/21-aos2153","DOIUrl":"https://doi.org/10.1214/21-aos2153","url":null,"abstract":"","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79321490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Given a random sample of points from some unknown density, we propose a method for estimating density level sets, for a given threshold t, under the r ́convexity assumption. This shape condition generalizes the convexity property and allows to consider level sets with more than one connected component. The main problem in practice is that r is an unknown geometric characteristic of the set related to its curvature, which may depend on t. A stochastic algorithm is proposed for selecting its value from data. The resulting reconstruction of the level set is able to achieve minimax rates for Hausdorff metric and distance in measure uniformly on the level t.
{"title":"A data-adaptive method for estimating density level sets under shape conditions","authors":"A. Rodríguez-Casal, P. Saavedra-Nieves","doi":"10.1214/21-aos2168","DOIUrl":"https://doi.org/10.1214/21-aos2168","url":null,"abstract":"Given a random sample of points from some unknown density, we propose a method for estimating density level sets, for a given threshold t, under the r ́convexity assumption. This shape condition generalizes the convexity property and allows to consider level sets with more than one connected component. The main problem in practice is that r is an unknown geometric characteristic of the set related to its curvature, which may depend on t. A stochastic algorithm is proposed for selecting its value from data. The resulting reconstruction of the level set is able to achieve minimax rates for Hausdorff metric and distance in measure uniformly on the level t.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75614441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Consistent order selection for ARFIMA processes","authors":"Hsueh-Han Huang, N. Chan, Kun Chen, C. Ing","doi":"10.1214/21-aos2149","DOIUrl":"https://doi.org/10.1214/21-aos2149","url":null,"abstract":"","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"386 1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75542601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Classical multiple testing theory prescribes the null distribution, which is often a too stringent assumption for nowadays large scale experiments. This paper presents theoretical foundations to understand the limitations caused by ignoring the null distribution, and how it can be properly learned from the (same) data-set, when possible. We explore this issue in the case where the null distributions are Gaussian with an unknown rescaling parameters (mean and variance) and the alternative distribution is let arbitrary. While an oracle procedure in that case is the Benjamini Hochberg procedure applied with the true (unknown) null distribution, we pursue the aim of building a procedure that asymptotically mimics the performance of the oracle (AMO in short). Our main result states that an AMO procedure exists if and only if the sparsity parameter k (number of false nulls) is of order less than n/ log(n), where n is the total number of tests. Further sparsity boundaries are derived for general location models where the shape of the null distribution is not necessarily Gaussian. Given our impossibility results, we also pursue a weaker objective, which is to find a confidence region for the oracle. To this end, we develop a distribution-dependent confidence region for the null distribution. As practical by-products, this provides a goodness of fit test for the null distribution, as well as a visual method assessing the reliability of empirical null multiple testing methods. Our results are illustrated with numerical experiments and a companion vignette Roquain and Verzelen (2020). AMS 2000 subject classifications: Primary 62G10; secondary 62C20.
{"title":"False discovery rate control with unknown null distribution: Is it possible to mimic the oracle?","authors":"Étienne Roquain, N. Verzelen","doi":"10.1214/21-aos2141","DOIUrl":"https://doi.org/10.1214/21-aos2141","url":null,"abstract":"Classical multiple testing theory prescribes the null distribution, which is often a too stringent assumption for nowadays large scale experiments. This paper presents theoretical foundations to understand the limitations caused by ignoring the null distribution, and how it can be properly learned from the (same) data-set, when possible. We explore this issue in the case where the null distributions are Gaussian with an unknown rescaling parameters (mean and variance) and the alternative distribution is let arbitrary. While an oracle procedure in that case is the Benjamini Hochberg procedure applied with the true (unknown) null distribution, we pursue the aim of building a procedure that asymptotically mimics the performance of the oracle (AMO in short). Our main result states that an AMO procedure exists if and only if the sparsity parameter k (number of false nulls) is of order less than n/ log(n), where n is the total number of tests. Further sparsity boundaries are derived for general location models where the shape of the null distribution is not necessarily Gaussian. Given our impossibility results, we also pursue a weaker objective, which is to find a confidence region for the oracle. To this end, we develop a distribution-dependent confidence region for the null distribution. As practical by-products, this provides a goodness of fit test for the null distribution, as well as a visual method assessing the reliability of empirical null multiple testing methods. Our results are illustrated with numerical experiments and a companion vignette Roquain and Verzelen (2020). AMS 2000 subject classifications: Primary 62G10; secondary 62C20.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"227 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80166690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive estimation in multivariate response regression with hidden variables","authors":"Xin Bing, Y. Ning, Yaosheng Xu","doi":"10.1214/21-aos2059","DOIUrl":"https://doi.org/10.1214/21-aos2059","url":null,"abstract":"","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"133 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84919837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parametric copula adjusted for non- and semiparametric regression","authors":"Yue Zhao, I. Gijbels, I. Van Keilegom","doi":"10.1214/21-aos2126","DOIUrl":"https://doi.org/10.1214/21-aos2126","url":null,"abstract":"","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74837633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider a sparse high dimensional regression model where the goal is to recover a k-sparse unknown binary vector β∗ from n noisy linear observations of the form Y = Xβ∗+W ∈ R where X ∈ Rn×p has i.i.d. N(0, 1) entries and W ∈ R has i.i.d. N(0, σ) entries. In the high signal-to-noise ratio regime and sublinear sparsity regime, while the order of the sample size needed to recover the unknown vector information-theoretially is known to be n∗ := 2k log p/ log(k/σ + 1), no polynomial-time algorithm is known to succeed unless n > nalg := (2k + σ) log p. In this work, we offer a series of results investigating multiple computational and statistical aspects of the recovery task in the regime n ∈ [n∗, nalg]. First, we establish a novel information-theoretic property of the MLE of the problem happening around n = n∗ samples, which we coin as an “all-or-nothing behavior”: when n > n∗ it recovers almost perfectly the support of β∗, while if n < n∗ it fails to recover any fraction of it correctly. Second, at an attempt to understand the computational hardness in the regime n ∈ [n∗, nalg] we prove that at order nalg samples there is an Overlap Gap Property (OGP) phase transition occurring at the landscape of the MLE: for constants c, C > 0 when n < cnalg OGP appears in the landscape of MLE while if n > Cnalg OGP disappears. OGP is a geometric “disconnectivity” property which initially appeared in the theory of spin glasses and is known to suggest algorithmic hardness when it occurs. Finally, using certain technical results obtained to establish the OGP phase transition, we additionally establish various novel positive and negative algorithmic results for the recovery task of interest, including the failure of LASSO with access to n < cnalg samples and the success of a simple Local Search method with access to n > Cnalg samples.
{"title":"Sparse high-dimensional linear regression. Estimating squared error and a phase transition","authors":"D. Gamarnik, Ilias Zadik","doi":"10.1214/21-aos2130","DOIUrl":"https://doi.org/10.1214/21-aos2130","url":null,"abstract":"We consider a sparse high dimensional regression model where the goal is to recover a k-sparse unknown binary vector β∗ from n noisy linear observations of the form Y = Xβ∗+W ∈ R where X ∈ Rn×p has i.i.d. N(0, 1) entries and W ∈ R has i.i.d. N(0, σ) entries. In the high signal-to-noise ratio regime and sublinear sparsity regime, while the order of the sample size needed to recover the unknown vector information-theoretially is known to be n∗ := 2k log p/ log(k/σ + 1), no polynomial-time algorithm is known to succeed unless n > nalg := (2k + σ) log p. In this work, we offer a series of results investigating multiple computational and statistical aspects of the recovery task in the regime n ∈ [n∗, nalg]. First, we establish a novel information-theoretic property of the MLE of the problem happening around n = n∗ samples, which we coin as an “all-or-nothing behavior”: when n > n∗ it recovers almost perfectly the support of β∗, while if n < n∗ it fails to recover any fraction of it correctly. Second, at an attempt to understand the computational hardness in the regime n ∈ [n∗, nalg] we prove that at order nalg samples there is an Overlap Gap Property (OGP) phase transition occurring at the landscape of the MLE: for constants c, C > 0 when n < cnalg OGP appears in the landscape of MLE while if n > Cnalg OGP disappears. OGP is a geometric “disconnectivity” property which initially appeared in the theory of spin glasses and is known to suggest algorithmic hardness when it occurs. Finally, using certain technical results obtained to establish the OGP phase transition, we additionally establish various novel positive and negative algorithmic results for the recovery task of interest, including the failure of LASSO with access to n < cnalg samples and the success of a simple Local Search method with access to n > Cnalg samples.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"2000 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88295763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Chopin, Sumeetpal S. Singh, Tom'as Soto, M. Vihola
We consider particle filters with weakly informative observations (or `potentials') relative to the latent state dynamics. The particular focus of this work is on particle filters to approximate time-discretisations of continuous-time Feynman--Kac path integral models -- a scenario that naturally arises when addressing filtering and smoothing problems in continuous time -- but our findings are indicative about weakly informative settings beyond this context too. We study the performance of different resampling schemes, such as systematic resampling, SSP (Srinivasan sampling process) and stratified resampling, as the time-discretisation becomes finer and also identify their continuous-time limit, which is expressed as a suitably defined `infinitesimal generator.' By contrasting these generators, we find that (certain modifications of) systematic and SSP resampling `dominate' stratified and independent `killing' resampling in terms of their limiting overall resampling rate. The reduced intensity of resampling manifests itself in lower variance in our numerical experiment. This efficiency result, through an ordering of the resampling rate, is new to the literature. The second major contribution of this work concerns the analysis of the limiting behaviour of the entire population of particles of the particle filter as the time discretisation becomes finer. We provide the first proof, under general conditions, that the particle approximation of the discretised continuous-time Feynman--Kac path integral models converges to a (uniformly weighted) continuous-time particle system.
{"title":"On resampling schemes for particle filters with weakly informative observations","authors":"N. Chopin, Sumeetpal S. Singh, Tom'as Soto, M. Vihola","doi":"10.1214/22-aos2222","DOIUrl":"https://doi.org/10.1214/22-aos2222","url":null,"abstract":"We consider particle filters with weakly informative observations (or `potentials') relative to the latent state dynamics. The particular focus of this work is on particle filters to approximate time-discretisations of continuous-time Feynman--Kac path integral models -- a scenario that naturally arises when addressing filtering and smoothing problems in continuous time -- but our findings are indicative about weakly informative settings beyond this context too. We study the performance of different resampling schemes, such as systematic resampling, SSP (Srinivasan sampling process) and stratified resampling, as the time-discretisation becomes finer and also identify their continuous-time limit, which is expressed as a suitably defined `infinitesimal generator.' By contrasting these generators, we find that (certain modifications of) systematic and SSP resampling `dominate' stratified and independent `killing' resampling in terms of their limiting overall resampling rate. The reduced intensity of resampling manifests itself in lower variance in our numerical experiment. This efficiency result, through an ordering of the resampling rate, is new to the literature. The second major contribution of this work concerns the analysis of the limiting behaviour of the entire population of particles of the particle filter as the time discretisation becomes finer. We provide the first proof, under general conditions, that the particle approximation of the discretised continuous-time Feynman--Kac path integral models converges to a (uniformly weighted) continuous-time particle system.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"128 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82914734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Orthogonal array, a classical and effective tool for collecting data, has been flourished with its applications in modern computer experiments and engineering statistics. Driven by the wide use of computer experiments with both qualitative and quantitative factors, multiple computer experiments, multi-fidelity computer experiments, cross-validation and stochastic optimization, orthogonal arrays with certain structures have been introduced. Sliced orthogonal arrays and nested orthogonal arrays are examples of such arrays. This article introduces a flexible, fresh construction method which uses smaller arrays and a special structure. The method uncovers the hidden structure of many existing fixed-level orthogonal arrays of given run sizes, possibly with more columns. It also allows fixed-level orthogonal arrays of nearly strength three to be constructed, which are useful as there are not many construction methods for fixed-level orthogonal arrays of strength three, and also helpful for generating Latin hypercube designs with desirable low-dimensional projections. Theoretical properties of the proposed method are explored. As by-products, several theoretical results on orthogonal arrays are obtained.
{"title":"A new and flexible design construction for orthogonal arrays for modern applications","authors":"Yuanzhen He, C. D. Lin, Fasheng Sun","doi":"10.1214/21-aos2159","DOIUrl":"https://doi.org/10.1214/21-aos2159","url":null,"abstract":"Orthogonal array, a classical and effective tool for collecting data, has been flourished with its applications in modern computer experiments and engineering statistics. Driven by the wide use of computer experiments with both qualitative and quantitative factors, multiple computer experiments, multi-fidelity computer experiments, cross-validation and stochastic optimization, orthogonal arrays with certain structures have been introduced. Sliced orthogonal arrays and nested orthogonal arrays are examples of such arrays. This article introduces a flexible, fresh construction method which uses smaller arrays and a special structure. The method uncovers the hidden structure of many existing fixed-level orthogonal arrays of given run sizes, possibly with more columns. It also allows fixed-level orthogonal arrays of nearly strength three to be constructed, which are useful as there are not many construction methods for fixed-level orthogonal arrays of strength three, and also helpful for generating Latin hypercube designs with desirable low-dimensional projections. Theoretical properties of the proposed method are explored. As by-products, several theoretical results on orthogonal arrays are obtained.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89375175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}