Discussion of “Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons”

IF 3.9 1区数学 Q1 STATISTICS & PROBABILITY Statistical Science Pub Date : 2020-11-01 DOI:10.1214/20-sts807

R. Mazumder

{"title":"Discussion of “Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons”","authors":"R. Mazumder","doi":"10.1214/20-sts807","DOIUrl":null,"url":null,"abstract":"I warmly congratulate the authors Hastie, Tibshirani and Tibshirani (HTT); and Bertsimas, Pauphilet and Van Parys (BPV) for their excellent contributions and important perspectives on sparse regression. Due to space constraints, and my greater familiarity with the content and context of HTT (I have had numerous fruitful discussions with the authors regarding their work), I will focus my discussion on the HTT paper. HTT nicely articulate the relative merits of three canonical estimators in sparse regression: L0, L1 and (forward)stepwise selection. I am humbled that a premise of their work is an article I wrote with Bertsimas and King [4] (BKM). BKM showed that current Mixed Integer Optimization (MIO) algorithms allow us to compute best subsets solutions for problem instances (p ≈ 1000 features) much larger than a previous benchmark (software for best subsets in the R package leaps) that could only handle instances with p ≈ 30. HTT by extending and refining the experiments performed by BKM, have helped clarify and deepen our understanding of L0, L1 and stepwise regression. They raise several intriguing questions that perhaps deserve further attention from the wider statistics and optimization communities. In this commentary, I will focus on some of the key points discussed in HTT, with a bias toward some of the recent work I have been involved in. There is a large and rich body of work in high-dimensional statistics and related optimization techniques that I will not be able to discuss within the limited scope of my commentary.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"35 1","pages":"602-608"},"PeriodicalIF":3.9000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Science","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/20-sts807","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 4

Abstract

I warmly congratulate the authors Hastie, Tibshirani and Tibshirani (HTT); and Bertsimas, Pauphilet and Van Parys (BPV) for their excellent contributions and important perspectives on sparse regression. Due to space constraints, and my greater familiarity with the content and context of HTT (I have had numerous fruitful discussions with the authors regarding their work), I will focus my discussion on the HTT paper. HTT nicely articulate the relative merits of three canonical estimators in sparse regression: L0, L1 and (forward)stepwise selection. I am humbled that a premise of their work is an article I wrote with Bertsimas and King [4] (BKM). BKM showed that current Mixed Integer Optimization (MIO) algorithms allow us to compute best subsets solutions for problem instances (p ≈ 1000 features) much larger than a previous benchmark (software for best subsets in the R package leaps) that could only handle instances with p ≈ 30. HTT by extending and refining the experiments performed by BKM, have helped clarify and deepen our understanding of L0, L1 and stepwise regression. They raise several intriguing questions that perhaps deserve further attention from the wider statistics and optimization communities. In this commentary, I will focus on some of the key points discussed in HTT, with a bias toward some of the recent work I have been involved in. There is a large and rich body of work in high-dimensional statistics and related optimization techniques that I will not be able to discuss within the limited scope of my commentary.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

关于“最佳子集、逐步前进还是套索”的讨论基于广泛比较的分析与建议

我热烈祝贺作者Hastie, Tibshirani和Tibshirani (HTT);以及Bertsimas、Pauphilet和Van Parys (BPV)对稀疏回归的杰出贡献和重要观点。由于篇幅限制，以及我对HTT的内容和上下文更加熟悉(我已经与作者就他们的工作进行了许多富有成效的讨论)，我将重点讨论HTT论文。HTT很好地阐明了稀疏回归中三个典型估计量的相对优点:L0、L1和(前向)逐步选择。他们工作的前提是我与Bertsimas和King b[4] (BKM)共同撰写的一篇文章，这让我感到谦卑。BKM表明，当前的混合整数优化(MIO)算法允许我们计算问题实例(p≈1000个特征)的最佳子集解决方案，比以前的基准(R包中最佳子集的软件飞跃)大得多，后者只能处理p≈30的实例。HTT通过扩展和完善BKM所做的实验，帮助我们澄清和加深了对L0、L1和逐步回归的理解。他们提出了几个有趣的问题，也许值得更广泛的统计和优化社区进一步关注。在这篇评论中，我将集中讨论HTT中讨论的一些关键点，并偏向于我最近参与的一些工作。在高维统计和相关优化技术方面有大量丰富的工作，我将无法在我的评论的有限范围内讨论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Statistical Science 数学-统计学与概率论

CiteScore

6.50

自引率

1.80%

发文量

审稿时长

>12 weeks

期刊介绍： The central purpose of Statistical Science is to convey the richness, breadth and unity of the field by presenting the full range of contemporary statistical thought at a moderate technical level, accessible to the wide community of practitioners, researchers and students of statistics and probability.

期刊最新文献

Scalable Empirical Bayes Inference and Bayesian Sensitivity Analysis. Variable Selection Using Bayesian Additive Regression Trees. Defining Replicability of Prediction Rules Tracking Truth Through Measurement and the Spyglass of Statistics Replicability Across Multiple Studies