Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection

arXiv - MATH - Statistics Theory Pub Date : 2024-05-05 DOI:arxiv-2405.03063

Jingbo Liu

{"title":"Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection","authors":"Jingbo Liu","doi":"arxiv-2405.03063","DOIUrl":null,"url":null,"abstract":"Suppose that we first apply the Lasso to a design matrix, and then update one\nof its columns. In general, the signs of the Lasso coefficients may change, and\nthere is no closed-form expression for updating the Lasso solution exactly. In\nthis work, we propose an approximate formula for updating a debiased Lasso\ncoefficient. We provide general nonasymptotic error bounds in terms of the\nnorms and correlations of a given design matrix's columns, and then prove\nasymptotic convergence results for the case of a random design matrix with\ni.i.d.\\ sub-Gaussian row vectors and i.i.d.\\ Gaussian noise. Notably, the\napproximate formula is asymptotically correct for most coordinates in the\nproportional growth regime, under the mild assumption that each row of the\ndesign matrix is sub-Gaussian with a covariance matrix having a bounded\ncondition number. Our proof only requires certain concentration and\nanti-concentration properties to control various error terms and the number of\nsign changes. In contrast, rigorously establishing distributional limit\nproperties (e.g.\\ Gaussian limits for the debiased Lasso) under similarly\ngeneral assumptions has been considered open problem in the universality\ntheory. As applications, we show that the approximate formula allows us to\nreduce the computation complexity of variable selection algorithms that require\nsolving multiple Lasso problems, such as the conditional randomization test and\na variant of the knockoff filter.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"118 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.03063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Suppose that we first apply the Lasso to a design matrix, and then update one of its columns. In general, the signs of the Lasso coefficients may change, and there is no closed-form expression for updating the Lasso solution exactly. In this work, we propose an approximate formula for updating a debiased Lasso coefficient. We provide general nonasymptotic error bounds in terms of the norms and correlations of a given design matrix's columns, and then prove asymptotic convergence results for the case of a random design matrix with i.i.d.\ sub-Gaussian row vectors and i.i.d.\ Gaussian noise. Notably, the approximate formula is asymptotically correct for most coordinates in the proportional growth regime, under the mild assumption that each row of the design matrix is sub-Gaussian with a covariance matrix having a bounded condition number. Our proof only requires certain concentration and anti-concentration properties to control various error terms and the number of sign changes. In contrast, rigorously establishing distributional limit properties (e.g.\ Gaussian limits for the debiased Lasso) under similarly general assumptions has been considered open problem in the universality theory. As applications, we show that the approximate formula allows us to reduce the computation complexity of variable selection algorithms that require solving multiple Lasso problems, such as the conditional randomization test and a variant of the knockoff filter.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

广义去偏拉索的稳定性及其在基于重采样的变量选择中的应用

假设我们首先对设计矩阵应用拉索法，然后更新其中一列。一般来说，Lasso 系数的符号可能会发生变化，因此没有精确更新 Lasso 解的封闭式表达式。在这项工作中，我们提出了更新去偏拉索系数的近似公式。我们用给定设计矩阵列的矩阵和相关性提供了一般的非渐近误差边界，然后证明了具有 i.i.d.\ sub-Gaussian 行向量和 i.i.d.\ Gaussian 噪声的随机设计矩阵的渐近收敛结果。值得注意的是，在设计矩阵的每一行都是亚高斯、协方差矩阵具有约束条件数的温和假设下，近似公式对于比例增长机制中的大多数坐标都是渐进正确的。我们的证明只需要一定的集中和反集中特性来控制各种误差项和符号变化的数量。相比之下，在类似的一般假设条件下严格建立分布极限特性（如去势拉索的高斯极限）一直被认为是普遍性理论中的未决问题。作为应用，我们展示了近似公式允许我们降低需要解决多个拉索问题的变量选择算法的计算复杂度，例如条件随机化检验和一种变体的山寨过滤器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - MATH - Statistics Theory

自引率

0.00%

发文量

期刊最新文献

Precision-based designs for sequential randomized experiments Strang Splitting for Parametric Inference in Second-order Stochastic Differential Equations Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection Tuning parameter selection in econometrics Limiting Behavior of Maxima under Dependence