{"title":"Asymptotic risk and phase transition of $l_{1}$-penalized robust estimator","authors":"Hanwen Huang","doi":"10.1214/19-AOS1923","DOIUrl":"https://doi.org/10.1214/19-AOS1923","url":null,"abstract":"","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48301749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Which bridge estimator is the best for variable selection?","authors":"Shuaiwen Wang, Haolei Weng, A. Maleki","doi":"10.1214/19-AOS1906","DOIUrl":"https://doi.org/10.1214/19-AOS1906","url":null,"abstract":"","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47315145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimation and inference for precision matrices of nonstationary time series","authors":"Xiucai Ding, Zhou Zhou","doi":"10.1214/19-aos1894","DOIUrl":"https://doi.org/10.1214/19-aos1894","url":null,"abstract":"","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49371694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper provides a unified framework and an efficient algorithm for analyzing high-dimensional survival data under weak modeling assumptions. In particular, it imposes neither parametric distributional assumption nor linear regression assumption. It only assumes that the survival time T depends on a high-dimensional covariate vector X through low-dimensional linear combinations of covariates ΓX. The censoring time is allowed to be conditionally independent of the survival time given the covariates. This general framework includes many popular parametric and semiparametric survival regression models as special cases. The proposed algorithm produces a number of practically useful outputs with theoretical guarantees, including a consistent estimate of the sufficient dimension reduction subspace of T |X, a uniformly consistent Kaplan-Meier type estimator of the conditional distribution function of T and a consistent estimator of the conditional quantile survival time. Our asymptotic results significantly extend the classical theory of sufficient dimension reduction for censored data (particularly that of Li et al. 1999) and the celebrated nonparametric Kaplan-Meier estimator to the setting where the number of covariates p diverges exponentially fast with the sample size n. We demonstrate the promising performance of the proposed new estimators through simulations and a real data example.
本文为弱建模假设下的高维生存数据分析提供了一个统一的框架和有效的算法。特别是,它既没有参数分布假设,也没有线性回归假设。它只假设生存时间T通过协变量的低维线性组合依赖于高维协变量向量X ΓX。允许审查时间与给定协变量的生存时间有条件地独立。这个通用框架包括许多流行的参数和半参数生存回归模型作为特殊情况。提出的算法产生了许多具有理论保证的实际有用的输出,包括T |X的充分降维子空间的一致估计,T的条件分布函数的一致一致Kaplan-Meier型估计和条件分位数生存时间的一致估计。我们的渐近结果显著地将经典的删节数据充分降维理论(特别是Li et al. 1999)和著名的非参数Kaplan-Meier估计扩展到协变量数p随样本量n呈指数快速发散的设置。我们通过模拟和实际数据示例证明了所提出的新估计的良好性能。
{"title":"Double-slicing assisted sufficient dimension reduction for high-dimensional censored data","authors":"Shanshan Ding, W. Qian, Lan Wang","doi":"10.1214/19-aos1880","DOIUrl":"https://doi.org/10.1214/19-aos1880","url":null,"abstract":"This paper provides a unified framework and an efficient algorithm for analyzing high-dimensional survival data under weak modeling assumptions. In particular, it imposes neither parametric distributional assumption nor linear regression assumption. It only assumes that the survival time T depends on a high-dimensional covariate vector X through low-dimensional linear combinations of covariates ΓX. The censoring time is allowed to be conditionally independent of the survival time given the covariates. This general framework includes many popular parametric and semiparametric survival regression models as special cases. The proposed algorithm produces a number of practically useful outputs with theoretical guarantees, including a consistent estimate of the sufficient dimension reduction subspace of T |X, a uniformly consistent Kaplan-Meier type estimator of the conditional distribution function of T and a consistent estimator of the conditional quantile survival time. Our asymptotic results significantly extend the classical theory of sufficient dimension reduction for censored data (particularly that of Li et al. 1999) and the celebrated nonparametric Kaplan-Meier estimator to the setting where the number of covariates p diverges exponentially fast with the sample size n. We demonstrate the promising performance of the proposed new estimators through simulations and a real data example.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44672142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existing frequency domain methods for bootstrapping time series have a limited range. Essentially, these procedures cover the case of linear time series with independent innovations, and some even require the time series to be Gaussian. In this paper we propose a new frequency domain bootstrap method – the hybrid periodogram bootstrap (HPB) – which is consistent for a much wider range of stationary, even nonlinear, processes and which can be applied to a large class of periodogram-based statistics. The HPB is designed to combine desirable features of different frequency domain techniques while overcoming their respective limitations. It is capable to imitate the weak dependence structure of the periodogram by invoking the concept of convolved subsampling in a novel way that is tailor-made for periodograms. We show consistency for the HPB procedure for a general class of stationary time series, ranging clearly beyond linear processes, and for spectral means and ratio statistics, on which we mainly focus. The finite sample performance of the new bootstrap procedure is illustrated via simulations.
{"title":"Extending the validity of frequency domain bootstrap methods to general stationary processes","authors":"M. Meyer, E. Paparoditis, Jens-Peter Kreiss","doi":"10.1214/19-aos1892","DOIUrl":"https://doi.org/10.1214/19-aos1892","url":null,"abstract":"Existing frequency domain methods for bootstrapping time series have a limited range. Essentially, these procedures cover the case of linear time series with independent innovations, and some even require the time series to be Gaussian. In this paper we propose a new frequency domain bootstrap method – the hybrid periodogram bootstrap (HPB) – which is consistent for a much wider range of stationary, even nonlinear, processes and which can be applied to a large class of periodogram-based statistics. The HPB is designed to combine desirable features of different frequency domain techniques while overcoming their respective limitations. It is capable to imitate the weak dependence structure of the periodogram by invoking the concept of convolved subsampling in a novel way that is tailor-made for periodograms. We show consistency for the HPB procedure for a general class of stationary time series, ranging clearly beyond linear processes, and for spectral means and ratio statistics, on which we mainly focus. The finite sample performance of the new bootstrap procedure is illustrated via simulations.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49428900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Higher criticism (HC) is a popular method for large-scale inference problems based on identifying unusually high proportions of small pvalues. It has been shown to enjoy a lower-order optimality property in a simple normal location mixture model which is shared by the ‘tailor-made’ parametric generalised likelihood ratio test (GLRT) for the same model, however HC has also been shown to perform well outside this ‘narrow’ model. We develop a higher-order framework for analysing the power of these and similar procedures, which reveals the perhaps unsurprising fact that the GLRT enjoys an edge in power over HC for the normal location mixture model. We also identify a similar parametric mixture model to which HC is similarly ‘tailor-made’ and show that the situation is (at least partly) reversed there. We also show that in the normal location mixture model a procedure based on the empirical moment-generating function enjoys the same local power properties as the GLRT and may be recommended as an easy to implement (and interpret), complementary procedure to HC. Some other practical advice regarding the implementation of these procedures is provided. Finally we provide some simulation results to help interpret our theoretical findings.
{"title":"Beyond HC: More sensitive tests for rare/weak alternatives","authors":"Thomas Porter, M. Stewart","doi":"10.1214/19-aos1885","DOIUrl":"https://doi.org/10.1214/19-aos1885","url":null,"abstract":"Higher criticism (HC) is a popular method for large-scale inference problems based on identifying unusually high proportions of small pvalues. It has been shown to enjoy a lower-order optimality property in a simple normal location mixture model which is shared by the ‘tailor-made’ parametric generalised likelihood ratio test (GLRT) for the same model, however HC has also been shown to perform well outside this ‘narrow’ model. We develop a higher-order framework for analysing the power of these and similar procedures, which reveals the perhaps unsurprising fact that the GLRT enjoys an edge in power over HC for the normal location mixture model. We also identify a similar parametric mixture model to which HC is similarly ‘tailor-made’ and show that the situation is (at least partly) reversed there. We also show that in the normal location mixture model a procedure based on the empirical moment-generating function enjoys the same local power properties as the GLRT and may be recommended as an easy to implement (and interpret), complementary procedure to HC. Some other practical advice regarding the implementation of these procedures is provided. Finally we provide some simulation results to help interpret our theoretical findings.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44608637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bayesian analysis for the covariance matrix of a multivariate normal distribution has received a lot of attention in the last two decades. In this paper, we propose a new class of priors for the covariance matrix, including both inverse Wishart and reference priors as special cases. The main motivation for the new class is to have available priors – both subjective and objective – that do not “force eigenvalues apart,” which is a criticism of inverse Wishart and Jeffreys priors. Extensive comparison of these ‘shrinkage priors’ with inverse Wishart and Jeffreys priors is undertaken, with the new priors seeming to have considerably better performance. A number of curious facts about the new priors are also observed, such as that the posterior distribution will be proper with just three vector observations from the multivariate normal distribution – regardless of the dimension of the covariance matrix – and that useful inference about features of the covariance matrix can be possible. Finally, a new MCMC algorithm is developed for this class of priors and is shown to be computationally effective for matrices of up to 100 dimensions.
{"title":"Bayesian analysis of the covariance matrix of a multivariate normal distribution with a new class of priors","authors":"J. Berger, Dongchu Sun, Chengyuan Song","doi":"10.1214/19-aos1891","DOIUrl":"https://doi.org/10.1214/19-aos1891","url":null,"abstract":"Bayesian analysis for the covariance matrix of a multivariate normal distribution has received a lot of attention in the last two decades. In this paper, we propose a new class of priors for the covariance matrix, including both inverse Wishart and reference priors as special cases. The main motivation for the new class is to have available priors – both subjective and objective – that do not “force eigenvalues apart,” which is a criticism of inverse Wishart and Jeffreys priors. Extensive comparison of these ‘shrinkage priors’ with inverse Wishart and Jeffreys priors is undertaken, with the new priors seeming to have considerably better performance. A number of curious facts about the new priors are also observed, such as that the posterior distribution will be proper with just three vector observations from the multivariate normal distribution – regardless of the dimension of the covariance matrix – and that useful inference about features of the covariance matrix can be possible. Finally, a new MCMC algorithm is developed for this class of priors and is shown to be computationally effective for matrices of up to 100 dimensions.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48597865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recovering low-rank structures via eigenvector perturbation analysis is a common problem in statistical machine learning, such as in factor analysis, community detection, ranking, matrix completion, among others. While a large variety of bounds are available for average errors between empirical and population statistics of eigenvectors, few results are tight for entrywise analyses, which are critical for a number of problems such as community detection. This paper investigates entrywise behaviors of eigenvectors for a large class of random matrices whose expectations are low-rank, which helps settle the conjecture in Abbe et al. (2014b) that the spectral algorithm achieves exact recovery in the stochastic block model without any trimming or cleaning steps. The key is a first-order approximation of eigenvectors under the ℓ∞ norm: where {uk } and are eigenvectors of a random matrix A and its expectation , respectively. The fact that the approximation is both tight and linear in A facilitates sharp comparisons between uk and . In particular, it allows for comparing the signs of uk and even if is large. The results are further extended to perturbations of eigenspaces, yielding new ℓ∞-type bounds for synchronization ( -spiked Wigner model) and noisy matrix completion.
通过特征向量扰动分析恢复低秩结构是统计机器学习中的一个常见问题,如因子分析、群落检测、排序、矩阵补全等。虽然对特征向量的经验统计和群体统计之间的平均误差有大量的约束,但很少有结果能严密地进行入口分析,而入口分析对群体检测等一系列问题至关重要。本文研究了一大类期望为低秩的随机矩阵的特征向量入口行为,这有助于解决 Abbe 等人(2014b)的猜想,即在随机块模型中,谱算法无需任何修剪或清理步骤即可实现精确恢复。关键在于ℓ ∞ 规范下特征向量的一阶近似:u k ≈ A u k * λ k *,其中 {u k } 和 { u k * } 分别是随机矩阵 A 的特征向量及其期望 E A。近似值在 A 中既紧密又线性,这一事实有助于对 u k 和 u k * 进行清晰的比较。特别是,即使 ‖ u k - u k * ‖ ∞ 很大,也能比较 u k 和 u k * 的符号。这些结果进一步扩展到特征空间的扰动,产生了同步化(ℤ 2 -spiked Wigner 模型)和噪声矩阵补全的新ℓ ∞ 型边界。
{"title":"ENTRYWISE EIGENVECTOR ANALYSIS OF RANDOM MATRICES WITH LOW EXPECTED RANK.","authors":"Emmanuel Abbe, Jianqing Fan, Kaizheng Wang, Yiqiao Zhong","doi":"10.1214/19-aos1854","DOIUrl":"10.1214/19-aos1854","url":null,"abstract":"<p><p>Recovering low-rank structures via eigenvector perturbation analysis is a common problem in statistical machine learning, such as in factor analysis, community detection, ranking, matrix completion, among others. While a large variety of bounds are available for average errors between empirical and population statistics of eigenvectors, few results are tight for entrywise analyses, which are critical for a number of problems such as community detection. This paper investigates entrywise behaviors of eigenvectors for a large class of random matrices whose expectations are low-rank, which helps settle the conjecture in Abbe et al. (2014b) that the spectral algorithm achieves exact recovery in the stochastic block model without any trimming or cleaning steps. The key is a first-order approximation of eigenvectors under the <i>ℓ</i> <sub>∞</sub> norm: <dispformula> <math> <mrow><msub><mi>u</mi> <mi>k</mi></msub> <mo>≈</mo> <mfrac><mrow><mi>A</mi> <msubsup><mi>u</mi> <mi>k</mi> <mo>*</mo></msubsup> </mrow> <mrow><msubsup><mi>λ</mi> <mi>k</mi> <mo>*</mo></msubsup> </mrow> </mfrac> <mo>,</mo></mrow> </math> </dispformula> where {<i>u</i> <sub><i>k</i></sub> } and <math> <mrow><mrow><mo>{</mo> <mrow><msubsup><mi>u</mi> <mi>k</mi> <mo>*</mo></msubsup> </mrow> <mo>}</mo></mrow> </mrow> </math> are eigenvectors of a random matrix <i>A</i> and its expectation <math><mrow><mi>E</mi> <mi>A</mi></mrow> </math> , respectively. The fact that the approximation is both tight and linear in <i>A</i> facilitates sharp comparisons between <i>u</i> <sub><i>k</i></sub> and <math> <mrow><msubsup><mi>u</mi> <mi>k</mi> <mo>*</mo></msubsup> </mrow> </math> . In particular, it allows for comparing the signs of <i>u</i> <sub><i>k</i></sub> and <math> <mrow><msubsup><mi>u</mi> <mi>k</mi> <mo>*</mo></msubsup> </mrow> </math> even if <math> <mrow> <msub> <mrow><mrow><mo>‖</mo> <mrow><msub><mi>u</mi> <mi>k</mi></msub> <mo>-</mo> <msubsup><mi>u</mi> <mi>k</mi> <mo>*</mo></msubsup> </mrow> <mo>‖</mo></mrow> </mrow> <mi>∞</mi></msub> </mrow> </math> is large. The results are further extended to perturbations of eigenspaces, yielding new <i>ℓ</i> <sub>∞</sub>-type bounds for synchronization ( <math> <mrow><msub><mi>ℤ</mi> <mn>2</mn></msub> </mrow> </math> -spiked Wigner model) and noisy matrix completion.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8046180/pdf/nihms-1053828.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38877757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider nonparametric regression in high dimensions where only a relatively small subset of a large number of variables are relevant and may have nonlinear effects on the response. We develop methods for variable selection, structure discovery and estimation of the true low-dimensional regression function, allowing any degree of interactions among the relevant variables that need not be specified a-priori. The proposed method, called the GRID, combines empirical likelihood based marginal testing with the local linear estimation machinery in a novel way to select the relevant variables. Further, it provides a simple graphical tool for identifying the low dimensional nonlinear structure of the regression function. Theoretical results establish consistency of variable selection and structure discovery, and also Oracle risk property of the GRID estimator of the regression function, allowing the dimension d of the covariates to grow with the sample size n at the rate d = O(n) for any a ∈ (0,∞) and the number of relevant covariates r to grow at a rate r = O(n) for some γ ∈ (0, 1) under some regularity conditions that, in particular, require finiteness of certain absolute moments of the error variables depending on a. Finite sample properties of the GRID are investigated in a moderately large simulation study.
{"title":"GRID: A variable selection and structure discovery method for high dimensional nonparametric regression","authors":"F. Giordano, S. Lahiri, M. L. Parrella","doi":"10.1214/19-aos1846","DOIUrl":"https://doi.org/10.1214/19-aos1846","url":null,"abstract":"We consider nonparametric regression in high dimensions where only a relatively small subset of a large number of variables are relevant and may have nonlinear effects on the response. We develop methods for variable selection, structure discovery and estimation of the true low-dimensional regression function, allowing any degree of interactions among the relevant variables that need not be specified a-priori. The proposed method, called the GRID, combines empirical likelihood based marginal testing with the local linear estimation machinery in a novel way to select the relevant variables. Further, it provides a simple graphical tool for identifying the low dimensional nonlinear structure of the regression function. Theoretical results establish consistency of variable selection and structure discovery, and also Oracle risk property of the GRID estimator of the regression function, allowing the dimension d of the covariates to grow with the sample size n at the rate d = O(n) for any a ∈ (0,∞) and the number of relevant covariates r to grow at a rate r = O(n) for some γ ∈ (0, 1) under some regularity conditions that, in particular, require finiteness of certain absolute moments of the error variables depending on a. Finite sample properties of the GRID are investigated in a moderately large simulation study.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48408022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}