首页 > 最新文献

Statistical Science最新文献

英文 中文
Introduction to the Special Section 专题介绍
IF 5.7 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2021-02-01 DOI: 10.1214/20-sts361ed
Yihong Wu, Harrison H. Zhou
{"title":"Introduction to the Special Section","authors":"Yihong Wu, Harrison H. Zhou","doi":"10.1214/20-sts361ed","DOIUrl":"https://doi.org/10.1214/20-sts361ed","url":null,"abstract":"","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42468701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Conversation with Tze Leung Lai 与Tze Leung Lai的对话
IF 5.7 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2021-02-01 DOI: 10.1214/20-sts775
Ying Lu, Dylan S. Small, Z. Ying
This conversation began in June 2015 in the Department of Statistics at Columbia University during Lai’s visit to his alma mater where he celebrated his seventieth birthday. It continued in the subsequent years at Columbia and Stanford. Lai was born on June 28, 1945, in Hong Kong, where he grew up and attended The University of Hong Kong, receiving his B.A. degree (First Class Honors) in Mathematics in 1967. He went to Columbia University in 1968 for graduate study in statistics and received his Ph.D. degree in 1971. He stayed on the faculty at Columbia and was appointed Higgins Professor of Mathematical Statistics in 1986. A year later he moved to Stanford, where he is currently Ray Lyman Wilbur Professor of Statistics, and by courtesy, also of Biomedical Data Science and Computational and Mathematical Engineering. He is a fellow of the Institute of Mathematical Statistics, the American Statistical Association and an elected member of Academia Sinica in Taiwan. He was the third recipient of the COPSS Award which he won in 1983. He has been married to Letitia Chow since 1975, and they have two sons and two grandchildren.
这段对话始于2015年6月,在赖访问母校庆祝70岁生日期间,在哥伦比亚大学统计系开始。在随后的几年里,它在哥伦比亚大学和斯坦福大学继续着。赖1945年6月28日出生于香港,在那里长大,就读于香港大学,1967年获得数学学士学位(一级荣誉)。1968年,他前往哥伦比亚大学攻读统计学研究生,1971年获得博士学位。他留在哥伦比亚大学任教,1986年被任命为希金斯数理统计教授。一年后,他搬到了斯坦福大学,目前是雷·莱曼·威尔伯统计学教授,同时也是生物医学数据科学和计算与数学工程教授。他是美国统计协会数理统计研究所研究员,台湾中央研究院当选委员。他是1983年获得的COPSS奖的第三位获得者。自1975年起,他与周结婚,育有两子两孙。
{"title":"A Conversation with Tze Leung Lai","authors":"Ying Lu, Dylan S. Small, Z. Ying","doi":"10.1214/20-sts775","DOIUrl":"https://doi.org/10.1214/20-sts775","url":null,"abstract":"This conversation began in June 2015 in the Department of Statistics at Columbia University during Lai’s visit to his alma mater where he celebrated his seventieth birthday. It continued in the subsequent years at Columbia and Stanford. Lai was born on June 28, 1945, in Hong Kong, where he grew up and attended The University of Hong Kong, receiving his B.A. degree (First Class Honors) in Mathematics in 1967. He went to Columbia University in 1968 for graduate study in statistics and received his Ph.D. degree in 1971. He stayed on the faculty at Columbia and was appointed Higgins Professor of Mathematical Statistics in 1986. A year later he moved to Stanford, where he is currently Ray Lyman Wilbur Professor of Statistics, and by courtesy, also of Biomedical Data Science and Computational and Mathematical Engineering. He is a fellow of the Institute of Mathematical Statistics, the American Statistical Association and an elected member of Academia Sinica in Taiwan. He was the third recipient of the COPSS Award which he won in 1983. He has been married to Letitia Chow since 1975, and they have two sons and two grandchildren.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42132267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparison of Two Frameworks for Analyzing Longitudinal Data 两种纵向数据分析框架的比较
IF 5.7 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2021-01-01 DOI: 10.1214/20-sts813
Jie Zhou, Xiao Zhou, Liuquan Sun
Under the random design of longitudinal data, observation times are irregular, and there are mainly two frameworks for analyzing such kind of longitudinal data. One is the clustered data framework and the other is the counting process framework. In this paper, we give a thorough comparison of these two frameworks in terms of data structure, model assumptions and estimation procedures. We find that modeling the observation times in the counting process framework will not gain any efficiency when the observation times are correlated with covariates but independent of the longitudinal response given covariates. Some simulation studies are conducted to compare the finite sample behaviors of the related estimators, and a real data analysis of the Alzheimer’s disease study is implemented for further comparison.
在纵向数据随机设计下,观测时间是不规则的,分析这类纵向数据主要有两种框架。一种是聚类数据框架,另一种是计数过程框架。在本文中,我们在数据结构、模型假设和估计过程方面对这两种框架进行了全面的比较。我们发现,当观测次数与协变量相关而与给定协变量的纵向响应无关时,在计数过程框架中对观测次数建模将不会获得任何效率。进行了仿真研究,比较了相关估计器的有限样本行为,并对阿尔茨海默病研究的真实数据进行了分析,进一步进行了比较。
{"title":"Comparison of Two Frameworks for Analyzing Longitudinal Data","authors":"Jie Zhou, Xiao Zhou, Liuquan Sun","doi":"10.1214/20-sts813","DOIUrl":"https://doi.org/10.1214/20-sts813","url":null,"abstract":"Under the random design of longitudinal data, observation times are irregular, and there are mainly two frameworks for analyzing such kind of longitudinal data. One is the clustered data framework and the other is the counting process framework. In this paper, we give a thorough comparison of these two frameworks in terms of data structure, model assumptions and estimation procedures. We find that modeling the observation times in the counting process framework will not gain any efficiency when the observation times are correlated with covariates but independent of the longitudinal response given covariates. Some simulation studies are conducted to compare the finite sample behaviors of the related estimators, and a real data analysis of the Alzheimer’s disease study is implemented for further comparison.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"1 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66085640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Confidence as Likelihood 信心即可能性
IF 5.7 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2021-01-01 DOI: 10.1214/20-sts811
Y. Pawitan, Youngjo Lee
Confidence and likelihood are fundamental statistical concepts with distinct technical interpretation and usage. Confidence is a meaningful concept of uncertainty within the context of confidence-interval procedure, while likelihood has been used predominantly as a tool for statistical modelling and inference given observed data. Here we show that confidence is in fact an extended likelihood, thus giving a much closer correspondence between the two concepts. This result gives the confidence concept an external meaning outside the confidence-interval context, and vice versa, it gives the confidence interpretation to the likelihood. In addition to the obvious interpretation purposes, this connection suggests two-way transfers of technical information. For example, the extended likelihood theory gives a clear way to update or combine confidence information. On the other hand, the confidence connection gives the extended likelihood direct access to the frequentist probability, an objective certification not directly available to the classical likelihood. This implies that intervals derived from the extended likelihood have the same logical status as confidence intervals, thus simplifying the terminology in the inference of random parameters.
置信度和似然度是具有不同技术解释和用法的基本统计概念。在置信区间过程中,置信度是一个有意义的不确定性概念,而似然主要被用作统计建模和给定观测数据推断的工具。在这里,我们表明信心实际上是一种扩展的可能性,从而在两个概念之间给出了更紧密的对应关系。这一结果使置信概念在置信区间上下文之外具有外部意义,反之亦然,它为似然提供了置信解释。除了明显的解释目的之外,这种联系表明技术信息的双向转移。例如,扩展似然理论提供了一种更新或组合置信度信息的清晰方法。另一方面,置信度连接使扩展似然直接获得频率概率,这是经典似然无法直接获得的客观证明。这意味着由扩展似然导出的区间与置信区间具有相同的逻辑状态,从而简化了随机参数推理中的术语。
{"title":"Confidence as Likelihood","authors":"Y. Pawitan, Youngjo Lee","doi":"10.1214/20-sts811","DOIUrl":"https://doi.org/10.1214/20-sts811","url":null,"abstract":"Confidence and likelihood are fundamental statistical concepts with distinct technical interpretation and usage. Confidence is a meaningful concept of uncertainty within the context of confidence-interval procedure, while likelihood has been used predominantly as a tool for statistical modelling and inference given observed data. Here we show that confidence is in fact an extended likelihood, thus giving a much closer correspondence between the two concepts. This result gives the confidence concept an external meaning outside the confidence-interval context, and vice versa, it gives the confidence interpretation to the likelihood. In addition to the obvious interpretation purposes, this connection suggests two-way transfers of technical information. For example, the extended likelihood theory gives a clear way to update or combine confidence information. On the other hand, the confidence connection gives the extended likelihood direct access to the frequentist probability, an objective certification not directly available to the classical likelihood. This implies that intervals derived from the extended likelihood have the same logical status as confidence intervals, thus simplifying the terminology in the inference of random parameters.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"1 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66085566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Gambler’s Ruin and the ICM 赌徒的毁灭和ICM
IF 5.7 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2020-11-15 DOI: 10.1214/21-sts826
P. Diaconis, S. Ethier
Consider gambler's ruin with three players, 1, 2, and 3, having initial capitals $A$, $B$, and $C$. At each round a pair of players is chosen (uniformly at random) and a fair coin flip is made resulting in the transfer of one unit between these two players. Eventually, one of the players is eliminated and the game continues with the remaining two. Let $sigmain S_3$ be the elimination order (e.g., $sigma=132$ means player 1 is eliminated first, player 3 is eliminated second, and player 2 is left with $A+B+C$). We seek approximations (and exact formulas) for the probabilities $P_{A,B,C}(sigma)$. One frequently used approximation, the independent chip model (ICM), is shown to be inadequate. A regression adjustment is proposed, which seems to give good approximations to the players' elimination order probabilities.
以赌徒的破产为例,有三个玩家,1、2和3,初始资本分别为$A$、$B$和$C$。在每一轮比赛中,都会选择一对选手(随机统一),并进行公平的硬币翻转,从而在这两名选手之间转移一个单位。最终,其中一名选手被淘汰,剩下的两名选手继续比赛。设S_3$中的$sigma为淘汰顺序(例如,$sigma=132$意味着玩家1首先被淘汰,玩家3第二被淘汰,并且玩家2剩下$A+B+C$)。我们寻求概率$P_{A,B,C}(西格玛)$的近似(和精确公式)。一个经常使用的近似,独立芯片模型(ICM),被证明是不够的。提出了一种回归调整,它似乎能很好地近似玩家的淘汰顺序概率。
{"title":"Gambler’s Ruin and the ICM","authors":"P. Diaconis, S. Ethier","doi":"10.1214/21-sts826","DOIUrl":"https://doi.org/10.1214/21-sts826","url":null,"abstract":"Consider gambler's ruin with three players, 1, 2, and 3, having initial capitals $A$, $B$, and $C$. At each round a pair of players is chosen (uniformly at random) and a fair coin flip is made resulting in the transfer of one unit between these two players. Eventually, one of the players is eliminated and the game continues with the remaining two. Let $sigmain S_3$ be the elimination order (e.g., $sigma=132$ means player 1 is eliminated first, player 3 is eliminated second, and player 2 is left with $A+B+C$). \u0000We seek approximations (and exact formulas) for the probabilities $P_{A,B,C}(sigma)$. One frequently used approximation, the independent chip model (ICM), is shown to be inadequate. A regression adjustment is proposed, which seems to give good approximations to the players' elimination order probabilities.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2020-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48579330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Methods to Compute Prediction Intervals: A Review and New Results 预测区间计算方法综述及新结果
IF 5.7 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2020-11-05 DOI: 10.1214/21-sts842
Qinglong Tian, D. Nordman, W. Meeker
This paper reviews two main types of prediction interval methods under a parametric framework. First, we describe methods based on an (approximate) pivotal quantity. Examples include the plug-in, pivotal, and calibration methods. Then we describe methods based on a predictive distribution (sometimes derived based on the likelihood). Examples include Bayesian, fiducial, and direct-bootstrap methods. Several examples involving continuous distributions along with simulation studies to evaluate coverage probability properties are provided. We provide specific connections among different prediction interval methods for the (log-)location-scale family of distributions. This paper also discusses general prediction interval methods for discrete data, using the binomial and Poisson distributions as examples. We also overview methods for dependent data, with application to time series, spatial data, and Markov random fields, for example.
本文综述了参数框架下两种主要的预测区间方法。首先,我们描述了基于(近似)关键量的方法。示例包括插件、枢纽和校准方法。然后我们描述了基于预测分布的方法(有时基于似然)。例子包括贝叶斯方法、基准方法和直接自举方法。提供了几个涉及连续分布的例子以及评估覆盖概率特性的模拟研究。我们提供了(对数)位置尺度分布家族的不同预测区间方法之间的具体联系。本文还以二项分布和泊松分布为例,讨论了离散数据的一般预测区间方法。我们还概述了相关数据的方法,例如应用于时间序列,空间数据和马尔可夫随机场。
{"title":"Methods to Compute Prediction Intervals: A Review and New Results","authors":"Qinglong Tian, D. Nordman, W. Meeker","doi":"10.1214/21-sts842","DOIUrl":"https://doi.org/10.1214/21-sts842","url":null,"abstract":"This paper reviews two main types of prediction interval methods under a parametric framework. First, we describe methods based on an (approximate) pivotal quantity. Examples include the plug-in, pivotal, and calibration methods. Then we describe methods based on a predictive distribution (sometimes derived based on the likelihood). Examples include Bayesian, fiducial, and direct-bootstrap methods. Several examples involving continuous distributions along with simulation studies to evaluate coverage probability properties are provided. We provide specific connections among different prediction interval methods for the (log-)location-scale family of distributions. This paper also discusses general prediction interval methods for discrete data, using the binomial and Poisson distributions as examples. We also overview methods for dependent data, with application to time series, spatial data, and Markov random fields, for example.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46757725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Look at Robustness and Stability of $ell_{1}$-versus $ell_{0}$-Regularization: Discussion of Papers by Bertsimas et al. and Hastie et al. 关于$ell_{1}$-与$ell_{0}$-正则化的鲁棒性和稳定性:Bertsimas等人和Hastie等人的论文讨论。
IF 5.7 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2020-11-01 DOI: 10.1214/20-sts809
Yuansi Chen, Armeen Taeb, P. Bühlmann
{"title":"A Look at Robustness and Stability of $ell_{1}$-versus $ell_{0}$-Regularization: Discussion of Papers by Bertsimas et al. and Hastie et al.","authors":"Yuansi Chen, Armeen Taeb, P. Bühlmann","doi":"10.1214/20-sts809","DOIUrl":"https://doi.org/10.1214/20-sts809","url":null,"abstract":"","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"35 1","pages":"614-622"},"PeriodicalIF":5.7,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45899156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Rejoinder: Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons 复辩手:最佳子集,前进逐步还是拉索?基于广泛比较的分析和建议
IF 5.7 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2020-11-01 DOI: 10.1214/20-sts733rej
T. Hastie, R. Tibshirani, R. Tibshirani
{"title":"Rejoinder: Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons","authors":"T. Hastie, R. Tibshirani, R. Tibshirani","doi":"10.1214/20-sts733rej","DOIUrl":"https://doi.org/10.1214/20-sts733rej","url":null,"abstract":"","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"35 1","pages":"579-592"},"PeriodicalIF":5.7,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44684797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Modern Variable Selection in Action: Comment on the Papers by HTT and BPV 现代变量选择在行动中——评HTT和BPV的论文
IF 5.7 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2020-11-01 DOI: 10.1214/20-sts808
E. George
Let me begin by congratulating the authors of these two papers, hereafter HTT and BPV, for their superb contributions to the comparisons of methods for variable selection problems in high dimensional regression. The methods considered are truly some of today’s leading contenders for coping with the size and complexity of big data problems of so much current importance. Not surprisingly, there is no clear winner here because the terrain of comparisons is so vast and complex, and no single method can dominate across all situations. The considered setups vary greatly in terms of the number of observations n, the number of predictors p, the number and relative sizes of the underlying nonzero regression coefficients, predictor correlation structures and signal-to-noise ratios (SNRs). And even these only scratch the surface of the infinite possibilities. Further, there is the additional issue as to which performance measure is most important. Is the goal of an analysis exact variable selection or prediction or both? And what about computational speed and scalability? All these considerations would naturally depend on the practical application at hand. The methods compared by HTT and BPV have been unleashed by extraordinary developments in computational speed, and so it is tempting to distinguish them primarily by their novel implementation algorithms. In particular, the recent integer optimization related algorithms for variable selection differ in fundamental ways from the now widely adopted coordinate ascent algorithms for the lasso related methods. Undoubtedly, the impressive improvements in computational speed unleashed by these algorithms are critical for the feasibility of practical applications. However, the more fundamental story behind the performance differences has to do with the differences between the criteria that their algorithms are seeking to optimize. In an important sense, they are being guided by different solutions to the general variable selection problem. Focusing first on the paper of HTT, its main thrust appears to have been kindled by the computational breakthrough of Bertsimas, King and Mazumder (2016) (hereafter BKM), which had proposed a mixed integer opti-
首先,让我祝贺这两篇论文(以下简称HTT和BPV)的作者,他们在高维回归中变量选择问题的方法比较方面做出了卓越的贡献。所考虑的方法确实是当今应对当前如此重要的大数据问题的规模和复杂性的主要竞争者。毫不奇怪,这里没有明确的赢家,因为比较的领域是如此广阔和复杂,没有一种单一的方法可以在所有情况下占据主导地位。所考虑的设置在观测次数n、预测因子数量p、潜在非零回归系数的数量和相对大小、预测因子相关性结构和信噪比(SNR)方面变化很大。即使是这些也只是触及了无限可能性的表面。此外,还有一个额外的问题,即哪种绩效衡量标准最重要。分析的目标是精确的变量选择还是预测,或者两者兼而有之?那么计算速度和可扩展性又如何呢?所有这些考虑自然取决于手头的实际应用。HTT和BPV所比较的方法是由计算速度的非凡发展释放出来的,因此主要通过它们新颖的实现算法来区分它们是很有吸引力的。特别地,最近用于变量选择的整数优化相关算法在基本方面与现在广泛采用的用于套索相关方法的坐标上升算法不同。毫无疑问,这些算法在计算速度上的显著提高对实际应用的可行性至关重要。然而,性能差异背后更根本的故事与他们的算法寻求优化的标准之间的差异有关。从一个重要的意义上说,他们受到一般变量选择问题的不同解决方案的指导。首先关注HTT的论文,它的主要推动力似乎是由Bertsimas、King和Mazumder(2016)(以下简称BKM)的计算突破点燃的,他们提出了一种混合整数光学-
{"title":"Modern Variable Selection in Action: Comment on the Papers by HTT and BPV","authors":"E. George","doi":"10.1214/20-sts808","DOIUrl":"https://doi.org/10.1214/20-sts808","url":null,"abstract":"Let me begin by congratulating the authors of these two papers, hereafter HTT and BPV, for their superb contributions to the comparisons of methods for variable selection problems in high dimensional regression. The methods considered are truly some of today’s leading contenders for coping with the size and complexity of big data problems of so much current importance. Not surprisingly, there is no clear winner here because the terrain of comparisons is so vast and complex, and no single method can dominate across all situations. The considered setups vary greatly in terms of the number of observations n, the number of predictors p, the number and relative sizes of the underlying nonzero regression coefficients, predictor correlation structures and signal-to-noise ratios (SNRs). And even these only scratch the surface of the infinite possibilities. Further, there is the additional issue as to which performance measure is most important. Is the goal of an analysis exact variable selection or prediction or both? And what about computational speed and scalability? All these considerations would naturally depend on the practical application at hand. The methods compared by HTT and BPV have been unleashed by extraordinary developments in computational speed, and so it is tempting to distinguish them primarily by their novel implementation algorithms. In particular, the recent integer optimization related algorithms for variable selection differ in fundamental ways from the now widely adopted coordinate ascent algorithms for the lasso related methods. Undoubtedly, the impressive improvements in computational speed unleashed by these algorithms are critical for the feasibility of practical applications. However, the more fundamental story behind the performance differences has to do with the differences between the criteria that their algorithms are seeking to optimize. In an important sense, they are being guided by different solutions to the general variable selection problem. Focusing first on the paper of HTT, its main thrust appears to have been kindled by the computational breakthrough of Bertsimas, King and Mazumder (2016) (hereafter BKM), which had proposed a mixed integer opti-","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"35 1","pages":"609-613"},"PeriodicalIF":5.7,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45250262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Discussion of “Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons” 关于“最佳子集、逐步前进还是套索”的讨论基于广泛比较的分析与建议
IF 5.7 1区 数学 Q1 STATISTICS & PROBABILITY Pub Date : 2020-11-01 DOI: 10.1214/20-sts807
R. Mazumder
I warmly congratulate the authors Hastie, Tibshirani and Tibshirani (HTT); and Bertsimas, Pauphilet and Van Parys (BPV) for their excellent contributions and important perspectives on sparse regression. Due to space constraints, and my greater familiarity with the content and context of HTT (I have had numerous fruitful discussions with the authors regarding their work), I will focus my discussion on the HTT paper. HTT nicely articulate the relative merits of three canonical estimators in sparse regression: L0, L1 and (forward)stepwise selection. I am humbled that a premise of their work is an article I wrote with Bertsimas and King [4] (BKM). BKM showed that current Mixed Integer Optimization (MIO) algorithms allow us to compute best subsets solutions for problem instances (p ≈ 1000 features) much larger than a previous benchmark (software for best subsets in the R package leaps) that could only handle instances with p ≈ 30. HTT by extending and refining the experiments performed by BKM, have helped clarify and deepen our understanding of L0, L1 and stepwise regression. They raise several intriguing questions that perhaps deserve further attention from the wider statistics and optimization communities. In this commentary, I will focus on some of the key points discussed in HTT, with a bias toward some of the recent work I have been involved in. There is a large and rich body of work in high-dimensional statistics and related optimization techniques that I will not be able to discuss within the limited scope of my commentary.
我热烈祝贺作者Hastie, Tibshirani和Tibshirani (HTT);以及Bertsimas、Pauphilet和Van Parys (BPV)对稀疏回归的杰出贡献和重要观点。由于篇幅限制,以及我对HTT的内容和上下文更加熟悉(我已经与作者就他们的工作进行了许多富有成效的讨论),我将重点讨论HTT论文。HTT很好地阐明了稀疏回归中三个典型估计量的相对优点:L0、L1和(前向)逐步选择。他们工作的前提是我与Bertsimas和King b[4] (BKM)共同撰写的一篇文章,这让我感到谦卑。BKM表明,当前的混合整数优化(MIO)算法允许我们计算问题实例(p≈1000个特征)的最佳子集解决方案,比以前的基准(R包中最佳子集的软件飞跃)大得多,后者只能处理p≈30的实例。HTT通过扩展和完善BKM所做的实验,帮助我们澄清和加深了对L0、L1和逐步回归的理解。他们提出了几个有趣的问题,也许值得更广泛的统计和优化社区进一步关注。在这篇评论中,我将集中讨论HTT中讨论的一些关键点,并偏向于我最近参与的一些工作。在高维统计和相关优化技术方面有大量丰富的工作,我将无法在我的评论的有限范围内讨论。
{"title":"Discussion of “Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons”","authors":"R. Mazumder","doi":"10.1214/20-sts807","DOIUrl":"https://doi.org/10.1214/20-sts807","url":null,"abstract":"I warmly congratulate the authors Hastie, Tibshirani and Tibshirani (HTT); and Bertsimas, Pauphilet and Van Parys (BPV) for their excellent contributions and important perspectives on sparse regression. Due to space constraints, and my greater familiarity with the content and context of HTT (I have had numerous fruitful discussions with the authors regarding their work), I will focus my discussion on the HTT paper. HTT nicely articulate the relative merits of three canonical estimators in sparse regression: L0, L1 and (forward)stepwise selection. I am humbled that a premise of their work is an article I wrote with Bertsimas and King [4] (BKM). BKM showed that current Mixed Integer Optimization (MIO) algorithms allow us to compute best subsets solutions for problem instances (p ≈ 1000 features) much larger than a previous benchmark (software for best subsets in the R package leaps) that could only handle instances with p ≈ 30. HTT by extending and refining the experiments performed by BKM, have helped clarify and deepen our understanding of L0, L1 and stepwise regression. They raise several intriguing questions that perhaps deserve further attention from the wider statistics and optimization communities. In this commentary, I will focus on some of the key points discussed in HTT, with a bias toward some of the recent work I have been involved in. There is a large and rich body of work in high-dimensional statistics and related optimization techniques that I will not be able to discuss within the limited scope of my commentary.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"35 1","pages":"602-608"},"PeriodicalIF":5.7,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47846338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Statistical Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1