{"title":"广义去偏拉索的稳定性及其在基于重采样的变量选择中的应用","authors":"Jingbo Liu","doi":"arxiv-2405.03063","DOIUrl":null,"url":null,"abstract":"Suppose that we first apply the Lasso to a design matrix, and then update one\nof its columns. In general, the signs of the Lasso coefficients may change, and\nthere is no closed-form expression for updating the Lasso solution exactly. In\nthis work, we propose an approximate formula for updating a debiased Lasso\ncoefficient. We provide general nonasymptotic error bounds in terms of the\nnorms and correlations of a given design matrix's columns, and then prove\nasymptotic convergence results for the case of a random design matrix with\ni.i.d.\\ sub-Gaussian row vectors and i.i.d.\\ Gaussian noise. Notably, the\napproximate formula is asymptotically correct for most coordinates in the\nproportional growth regime, under the mild assumption that each row of the\ndesign matrix is sub-Gaussian with a covariance matrix having a bounded\ncondition number. Our proof only requires certain concentration and\nanti-concentration properties to control various error terms and the number of\nsign changes. In contrast, rigorously establishing distributional limit\nproperties (e.g.\\ Gaussian limits for the debiased Lasso) under similarly\ngeneral assumptions has been considered open problem in the universality\ntheory. As applications, we show that the approximate formula allows us to\nreduce the computation complexity of variable selection algorithms that require\nsolving multiple Lasso problems, such as the conditional randomization test and\na variant of the knockoff filter.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"118 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection\",\"authors\":\"Jingbo Liu\",\"doi\":\"arxiv-2405.03063\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Suppose that we first apply the Lasso to a design matrix, and then update one\\nof its columns. In general, the signs of the Lasso coefficients may change, and\\nthere is no closed-form expression for updating the Lasso solution exactly. In\\nthis work, we propose an approximate formula for updating a debiased Lasso\\ncoefficient. We provide general nonasymptotic error bounds in terms of the\\nnorms and correlations of a given design matrix's columns, and then prove\\nasymptotic convergence results for the case of a random design matrix with\\ni.i.d.\\\\ sub-Gaussian row vectors and i.i.d.\\\\ Gaussian noise. Notably, the\\napproximate formula is asymptotically correct for most coordinates in the\\nproportional growth regime, under the mild assumption that each row of the\\ndesign matrix is sub-Gaussian with a covariance matrix having a bounded\\ncondition number. Our proof only requires certain concentration and\\nanti-concentration properties to control various error terms and the number of\\nsign changes. In contrast, rigorously establishing distributional limit\\nproperties (e.g.\\\\ Gaussian limits for the debiased Lasso) under similarly\\ngeneral assumptions has been considered open problem in the universality\\ntheory. As applications, we show that the approximate formula allows us to\\nreduce the computation complexity of variable selection algorithms that require\\nsolving multiple Lasso problems, such as the conditional randomization test and\\na variant of the knockoff filter.\",\"PeriodicalId\":501330,\"journal\":{\"name\":\"arXiv - MATH - Statistics Theory\",\"volume\":\"118 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - MATH - Statistics Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2405.03063\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.03063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection
Suppose that we first apply the Lasso to a design matrix, and then update one
of its columns. In general, the signs of the Lasso coefficients may change, and
there is no closed-form expression for updating the Lasso solution exactly. In
this work, we propose an approximate formula for updating a debiased Lasso
coefficient. We provide general nonasymptotic error bounds in terms of the
norms and correlations of a given design matrix's columns, and then prove
asymptotic convergence results for the case of a random design matrix with
i.i.d.\ sub-Gaussian row vectors and i.i.d.\ Gaussian noise. Notably, the
approximate formula is asymptotically correct for most coordinates in the
proportional growth regime, under the mild assumption that each row of the
design matrix is sub-Gaussian with a covariance matrix having a bounded
condition number. Our proof only requires certain concentration and
anti-concentration properties to control various error terms and the number of
sign changes. In contrast, rigorously establishing distributional limit
properties (e.g.\ Gaussian limits for the debiased Lasso) under similarly
general assumptions has been considered open problem in the universality
theory. As applications, we show that the approximate formula allows us to
reduce the computation complexity of variable selection algorithms that require
solving multiple Lasso problems, such as the conditional randomization test and
a variant of the knockoff filter.