{"title":"高维LASSO变量选择一致性的充分必要条件","authors":"S. Lahiri","doi":"10.1214/20-AOS1979","DOIUrl":null,"url":null,"abstract":"This paper investigates conditions for variable selection consistency of the LASSO in high dimensional regression models and gives necessary and sufficient conditions for the same, potentially allowing the model dimension p to grow arbitrarily fast as a function of the sample size n. These conditions require both upper and lower bounds on the growth rate of the penalty parameter. It turns out that a variant of the irrepresentable Condition (IRC) of Zhao and Yu (2006), herein called the lower irrepresentable Condition (or LIRC), is determined by the lower bound considerations while the upper bound considerations lead to a new condition, called the upper irrepresentable Condition (or UIRC) in this paper. It is shown that the LIRC together with the UIRC is necessary and sufficient for the variable selection consistency of the LASSO, thereby settling a conjecture of (Zhao and Yu, 2006). Further, it is shown that under some mild regularity conditions, the penalty parameter must necessarily tend to infinity at a certain minimal rate to ensure variable selection consistency of the LASSO and that the corresponding LASSO estimators of the nonzero regression parameters can not be √ nconsistent (even for individual parameters). Thus, under fairly general conditions, the LASSO with a single choice of the penalty parameter can not achieve both variable selection consistency and √ n-consistency simultaneously. MSC 2010 subject classifications: Primary62E20; secondary 62J05.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":" ","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Necessary and sufficient conditions for variable selection consistency of the LASSO in high dimensions\",\"authors\":\"S. Lahiri\",\"doi\":\"10.1214/20-AOS1979\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper investigates conditions for variable selection consistency of the LASSO in high dimensional regression models and gives necessary and sufficient conditions for the same, potentially allowing the model dimension p to grow arbitrarily fast as a function of the sample size n. These conditions require both upper and lower bounds on the growth rate of the penalty parameter. It turns out that a variant of the irrepresentable Condition (IRC) of Zhao and Yu (2006), herein called the lower irrepresentable Condition (or LIRC), is determined by the lower bound considerations while the upper bound considerations lead to a new condition, called the upper irrepresentable Condition (or UIRC) in this paper. It is shown that the LIRC together with the UIRC is necessary and sufficient for the variable selection consistency of the LASSO, thereby settling a conjecture of (Zhao and Yu, 2006). Further, it is shown that under some mild regularity conditions, the penalty parameter must necessarily tend to infinity at a certain minimal rate to ensure variable selection consistency of the LASSO and that the corresponding LASSO estimators of the nonzero regression parameters can not be √ nconsistent (even for individual parameters). Thus, under fairly general conditions, the LASSO with a single choice of the penalty parameter can not achieve both variable selection consistency and √ n-consistency simultaneously. MSC 2010 subject classifications: Primary62E20; secondary 62J05.\",\"PeriodicalId\":8032,\"journal\":{\"name\":\"Annals of Statistics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2021-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Statistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1214/20-AOS1979\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/20-AOS1979","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 12
摘要
本文研究了高维回归模型中LASSO变量选择一致性的条件,并给出了必要和充分条件,可能允许模型维数p作为样本量n的函数任意快速增长。这些条件需要惩罚参数增长率的上界和下界。结果表明,Zhao和Yu(2006)的不可表征条件(IRC)的一个变体,这里称为下不可表征条件(LIRC),由下界考虑决定,而上界考虑导致一个新的条件,本文称为上不可表征条件(UIRC)。证明了lrc和UIRC对于LASSO的变量选择一致性是充分必要的,从而解决了(Zhao and Yu, 2006)的猜想。进一步证明了在一些温和的正则性条件下,惩罚参数必须以一定的最小速率趋于无穷大,以保证LASSO的变量选择一致性,并且相应的非零回归参数的LASSO估计量不能是不一致的(即使是单个参数)。因此,在相当一般的条件下,惩罚参数选择单一的LASSO不能同时实现变量选择一致性和√n一致性。MSC 2010学科分类:Primary62E20;二次62 j05。
Necessary and sufficient conditions for variable selection consistency of the LASSO in high dimensions
This paper investigates conditions for variable selection consistency of the LASSO in high dimensional regression models and gives necessary and sufficient conditions for the same, potentially allowing the model dimension p to grow arbitrarily fast as a function of the sample size n. These conditions require both upper and lower bounds on the growth rate of the penalty parameter. It turns out that a variant of the irrepresentable Condition (IRC) of Zhao and Yu (2006), herein called the lower irrepresentable Condition (or LIRC), is determined by the lower bound considerations while the upper bound considerations lead to a new condition, called the upper irrepresentable Condition (or UIRC) in this paper. It is shown that the LIRC together with the UIRC is necessary and sufficient for the variable selection consistency of the LASSO, thereby settling a conjecture of (Zhao and Yu, 2006). Further, it is shown that under some mild regularity conditions, the penalty parameter must necessarily tend to infinity at a certain minimal rate to ensure variable selection consistency of the LASSO and that the corresponding LASSO estimators of the nonzero regression parameters can not be √ nconsistent (even for individual parameters). Thus, under fairly general conditions, the LASSO with a single choice of the penalty parameter can not achieve both variable selection consistency and √ n-consistency simultaneously. MSC 2010 subject classifications: Primary62E20; secondary 62J05.
期刊介绍:
The Annals of Statistics aim to publish research papers of highest quality reflecting the many facets of contemporary statistics. Primary emphasis is placed on importance and originality, not on formalism. The journal aims to cover all areas of statistics, especially mathematical statistics and applied & interdisciplinary statistics. Of course many of the best papers will touch on more than one of these general areas, because the discipline of statistics has deep roots in mathematics, and in substantive scientific fields.