{"title":"Introduction to the Special Section","authors":"Yihong Wu, Harrison H. Zhou","doi":"10.1214/20-sts361ed","DOIUrl":"https://doi.org/10.1214/20-sts361ed","url":null,"abstract":"","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42468701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This conversation began in June 2015 in the Department of Statistics at Columbia University during Lai’s visit to his alma mater where he celebrated his seventieth birthday. It continued in the subsequent years at Columbia and Stanford. Lai was born on June 28, 1945, in Hong Kong, where he grew up and attended The University of Hong Kong, receiving his B.A. degree (First Class Honors) in Mathematics in 1967. He went to Columbia University in 1968 for graduate study in statistics and received his Ph.D. degree in 1971. He stayed on the faculty at Columbia and was appointed Higgins Professor of Mathematical Statistics in 1986. A year later he moved to Stanford, where he is currently Ray Lyman Wilbur Professor of Statistics, and by courtesy, also of Biomedical Data Science and Computational and Mathematical Engineering. He is a fellow of the Institute of Mathematical Statistics, the American Statistical Association and an elected member of Academia Sinica in Taiwan. He was the third recipient of the COPSS Award which he won in 1983. He has been married to Letitia Chow since 1975, and they have two sons and two grandchildren.
{"title":"A Conversation with Tze Leung Lai","authors":"Ying Lu, Dylan S. Small, Z. Ying","doi":"10.1214/20-sts775","DOIUrl":"https://doi.org/10.1214/20-sts775","url":null,"abstract":"This conversation began in June 2015 in the Department of Statistics at Columbia University during Lai’s visit to his alma mater where he celebrated his seventieth birthday. It continued in the subsequent years at Columbia and Stanford. Lai was born on June 28, 1945, in Hong Kong, where he grew up and attended The University of Hong Kong, receiving his B.A. degree (First Class Honors) in Mathematics in 1967. He went to Columbia University in 1968 for graduate study in statistics and received his Ph.D. degree in 1971. He stayed on the faculty at Columbia and was appointed Higgins Professor of Mathematical Statistics in 1986. A year later he moved to Stanford, where he is currently Ray Lyman Wilbur Professor of Statistics, and by courtesy, also of Biomedical Data Science and Computational and Mathematical Engineering. He is a fellow of the Institute of Mathematical Statistics, the American Statistical Association and an elected member of Academia Sinica in Taiwan. He was the third recipient of the COPSS Award which he won in 1983. He has been married to Letitia Chow since 1975, and they have two sons and two grandchildren.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42132267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Under the random design of longitudinal data, observation times are irregular, and there are mainly two frameworks for analyzing such kind of longitudinal data. One is the clustered data framework and the other is the counting process framework. In this paper, we give a thorough comparison of these two frameworks in terms of data structure, model assumptions and estimation procedures. We find that modeling the observation times in the counting process framework will not gain any efficiency when the observation times are correlated with covariates but independent of the longitudinal response given covariates. Some simulation studies are conducted to compare the finite sample behaviors of the related estimators, and a real data analysis of the Alzheimer’s disease study is implemented for further comparison.
{"title":"Comparison of Two Frameworks for Analyzing Longitudinal Data","authors":"Jie Zhou, Xiao Zhou, Liuquan Sun","doi":"10.1214/20-sts813","DOIUrl":"https://doi.org/10.1214/20-sts813","url":null,"abstract":"Under the random design of longitudinal data, observation times are irregular, and there are mainly two frameworks for analyzing such kind of longitudinal data. One is the clustered data framework and the other is the counting process framework. In this paper, we give a thorough comparison of these two frameworks in terms of data structure, model assumptions and estimation procedures. We find that modeling the observation times in the counting process framework will not gain any efficiency when the observation times are correlated with covariates but independent of the longitudinal response given covariates. Some simulation studies are conducted to compare the finite sample behaviors of the related estimators, and a real data analysis of the Alzheimer’s disease study is implemented for further comparison.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"1 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66085640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Confidence and likelihood are fundamental statistical concepts with distinct technical interpretation and usage. Confidence is a meaningful concept of uncertainty within the context of confidence-interval procedure, while likelihood has been used predominantly as a tool for statistical modelling and inference given observed data. Here we show that confidence is in fact an extended likelihood, thus giving a much closer correspondence between the two concepts. This result gives the confidence concept an external meaning outside the confidence-interval context, and vice versa, it gives the confidence interpretation to the likelihood. In addition to the obvious interpretation purposes, this connection suggests two-way transfers of technical information. For example, the extended likelihood theory gives a clear way to update or combine confidence information. On the other hand, the confidence connection gives the extended likelihood direct access to the frequentist probability, an objective certification not directly available to the classical likelihood. This implies that intervals derived from the extended likelihood have the same logical status as confidence intervals, thus simplifying the terminology in the inference of random parameters.
{"title":"Confidence as Likelihood","authors":"Y. Pawitan, Youngjo Lee","doi":"10.1214/20-sts811","DOIUrl":"https://doi.org/10.1214/20-sts811","url":null,"abstract":"Confidence and likelihood are fundamental statistical concepts with distinct technical interpretation and usage. Confidence is a meaningful concept of uncertainty within the context of confidence-interval procedure, while likelihood has been used predominantly as a tool for statistical modelling and inference given observed data. Here we show that confidence is in fact an extended likelihood, thus giving a much closer correspondence between the two concepts. This result gives the confidence concept an external meaning outside the confidence-interval context, and vice versa, it gives the confidence interpretation to the likelihood. In addition to the obvious interpretation purposes, this connection suggests two-way transfers of technical information. For example, the extended likelihood theory gives a clear way to update or combine confidence information. On the other hand, the confidence connection gives the extended likelihood direct access to the frequentist probability, an objective certification not directly available to the classical likelihood. This implies that intervals derived from the extended likelihood have the same logical status as confidence intervals, thus simplifying the terminology in the inference of random parameters.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"1 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66085566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Consider gambler's ruin with three players, 1, 2, and 3, having initial capitals $A$, $B$, and $C$. At each round a pair of players is chosen (uniformly at random) and a fair coin flip is made resulting in the transfer of one unit between these two players. Eventually, one of the players is eliminated and the game continues with the remaining two. Let $sigmain S_3$ be the elimination order (e.g., $sigma=132$ means player 1 is eliminated first, player 3 is eliminated second, and player 2 is left with $A+B+C$). We seek approximations (and exact formulas) for the probabilities $P_{A,B,C}(sigma)$. One frequently used approximation, the independent chip model (ICM), is shown to be inadequate. A regression adjustment is proposed, which seems to give good approximations to the players' elimination order probabilities.
{"title":"Gambler’s Ruin and the ICM","authors":"P. Diaconis, S. Ethier","doi":"10.1214/21-sts826","DOIUrl":"https://doi.org/10.1214/21-sts826","url":null,"abstract":"Consider gambler's ruin with three players, 1, 2, and 3, having initial capitals $A$, $B$, and $C$. At each round a pair of players is chosen (uniformly at random) and a fair coin flip is made resulting in the transfer of one unit between these two players. Eventually, one of the players is eliminated and the game continues with the remaining two. Let $sigmain S_3$ be the elimination order (e.g., $sigma=132$ means player 1 is eliminated first, player 3 is eliminated second, and player 2 is left with $A+B+C$). \u0000We seek approximations (and exact formulas) for the probabilities $P_{A,B,C}(sigma)$. One frequently used approximation, the independent chip model (ICM), is shown to be inadequate. A regression adjustment is proposed, which seems to give good approximations to the players' elimination order probabilities.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2020-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48579330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper reviews two main types of prediction interval methods under a parametric framework. First, we describe methods based on an (approximate) pivotal quantity. Examples include the plug-in, pivotal, and calibration methods. Then we describe methods based on a predictive distribution (sometimes derived based on the likelihood). Examples include Bayesian, fiducial, and direct-bootstrap methods. Several examples involving continuous distributions along with simulation studies to evaluate coverage probability properties are provided. We provide specific connections among different prediction interval methods for the (log-)location-scale family of distributions. This paper also discusses general prediction interval methods for discrete data, using the binomial and Poisson distributions as examples. We also overview methods for dependent data, with application to time series, spatial data, and Markov random fields, for example.
{"title":"Methods to Compute Prediction Intervals: A Review and New Results","authors":"Qinglong Tian, D. Nordman, W. Meeker","doi":"10.1214/21-sts842","DOIUrl":"https://doi.org/10.1214/21-sts842","url":null,"abstract":"This paper reviews two main types of prediction interval methods under a parametric framework. First, we describe methods based on an (approximate) pivotal quantity. Examples include the plug-in, pivotal, and calibration methods. Then we describe methods based on a predictive distribution (sometimes derived based on the likelihood). Examples include Bayesian, fiducial, and direct-bootstrap methods. Several examples involving continuous distributions along with simulation studies to evaluate coverage probability properties are provided. We provide specific connections among different prediction interval methods for the (log-)location-scale family of distributions. This paper also discusses general prediction interval methods for discrete data, using the binomial and Poisson distributions as examples. We also overview methods for dependent data, with application to time series, spatial data, and Markov random fields, for example.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46757725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Look at Robustness and Stability of $ell_{1}$-versus $ell_{0}$-Regularization: Discussion of Papers by Bertsimas et al. and Hastie et al.","authors":"Yuansi Chen, Armeen Taeb, P. Bühlmann","doi":"10.1214/20-sts809","DOIUrl":"https://doi.org/10.1214/20-sts809","url":null,"abstract":"","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"35 1","pages":"614-622"},"PeriodicalIF":5.7,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45899156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rejoinder: Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons","authors":"T. Hastie, R. Tibshirani, R. Tibshirani","doi":"10.1214/20-sts733rej","DOIUrl":"https://doi.org/10.1214/20-sts733rej","url":null,"abstract":"","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"35 1","pages":"579-592"},"PeriodicalIF":5.7,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44684797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Let me begin by congratulating the authors of these two papers, hereafter HTT and BPV, for their superb contributions to the comparisons of methods for variable selection problems in high dimensional regression. The methods considered are truly some of today’s leading contenders for coping with the size and complexity of big data problems of so much current importance. Not surprisingly, there is no clear winner here because the terrain of comparisons is so vast and complex, and no single method can dominate across all situations. The considered setups vary greatly in terms of the number of observations n, the number of predictors p, the number and relative sizes of the underlying nonzero regression coefficients, predictor correlation structures and signal-to-noise ratios (SNRs). And even these only scratch the surface of the infinite possibilities. Further, there is the additional issue as to which performance measure is most important. Is the goal of an analysis exact variable selection or prediction or both? And what about computational speed and scalability? All these considerations would naturally depend on the practical application at hand. The methods compared by HTT and BPV have been unleashed by extraordinary developments in computational speed, and so it is tempting to distinguish them primarily by their novel implementation algorithms. In particular, the recent integer optimization related algorithms for variable selection differ in fundamental ways from the now widely adopted coordinate ascent algorithms for the lasso related methods. Undoubtedly, the impressive improvements in computational speed unleashed by these algorithms are critical for the feasibility of practical applications. However, the more fundamental story behind the performance differences has to do with the differences between the criteria that their algorithms are seeking to optimize. In an important sense, they are being guided by different solutions to the general variable selection problem. Focusing first on the paper of HTT, its main thrust appears to have been kindled by the computational breakthrough of Bertsimas, King and Mazumder (2016) (hereafter BKM), which had proposed a mixed integer opti-
{"title":"Modern Variable Selection in Action: Comment on the Papers by HTT and BPV","authors":"E. George","doi":"10.1214/20-sts808","DOIUrl":"https://doi.org/10.1214/20-sts808","url":null,"abstract":"Let me begin by congratulating the authors of these two papers, hereafter HTT and BPV, for their superb contributions to the comparisons of methods for variable selection problems in high dimensional regression. The methods considered are truly some of today’s leading contenders for coping with the size and complexity of big data problems of so much current importance. Not surprisingly, there is no clear winner here because the terrain of comparisons is so vast and complex, and no single method can dominate across all situations. The considered setups vary greatly in terms of the number of observations n, the number of predictors p, the number and relative sizes of the underlying nonzero regression coefficients, predictor correlation structures and signal-to-noise ratios (SNRs). And even these only scratch the surface of the infinite possibilities. Further, there is the additional issue as to which performance measure is most important. Is the goal of an analysis exact variable selection or prediction or both? And what about computational speed and scalability? All these considerations would naturally depend on the practical application at hand. The methods compared by HTT and BPV have been unleashed by extraordinary developments in computational speed, and so it is tempting to distinguish them primarily by their novel implementation algorithms. In particular, the recent integer optimization related algorithms for variable selection differ in fundamental ways from the now widely adopted coordinate ascent algorithms for the lasso related methods. Undoubtedly, the impressive improvements in computational speed unleashed by these algorithms are critical for the feasibility of practical applications. However, the more fundamental story behind the performance differences has to do with the differences between the criteria that their algorithms are seeking to optimize. In an important sense, they are being guided by different solutions to the general variable selection problem. Focusing first on the paper of HTT, its main thrust appears to have been kindled by the computational breakthrough of Bertsimas, King and Mazumder (2016) (hereafter BKM), which had proposed a mixed integer opti-","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"35 1","pages":"609-613"},"PeriodicalIF":5.7,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45250262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I warmly congratulate the authors Hastie, Tibshirani and Tibshirani (HTT); and Bertsimas, Pauphilet and Van Parys (BPV) for their excellent contributions and important perspectives on sparse regression. Due to space constraints, and my greater familiarity with the content and context of HTT (I have had numerous fruitful discussions with the authors regarding their work), I will focus my discussion on the HTT paper. HTT nicely articulate the relative merits of three canonical estimators in sparse regression: L0, L1 and (forward)stepwise selection. I am humbled that a premise of their work is an article I wrote with Bertsimas and King [4] (BKM). BKM showed that current Mixed Integer Optimization (MIO) algorithms allow us to compute best subsets solutions for problem instances (p ≈ 1000 features) much larger than a previous benchmark (software for best subsets in the R package leaps) that could only handle instances with p ≈ 30. HTT by extending and refining the experiments performed by BKM, have helped clarify and deepen our understanding of L0, L1 and stepwise regression. They raise several intriguing questions that perhaps deserve further attention from the wider statistics and optimization communities. In this commentary, I will focus on some of the key points discussed in HTT, with a bias toward some of the recent work I have been involved in. There is a large and rich body of work in high-dimensional statistics and related optimization techniques that I will not be able to discuss within the limited scope of my commentary.
{"title":"Discussion of “Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons”","authors":"R. Mazumder","doi":"10.1214/20-sts807","DOIUrl":"https://doi.org/10.1214/20-sts807","url":null,"abstract":"I warmly congratulate the authors Hastie, Tibshirani and Tibshirani (HTT); and Bertsimas, Pauphilet and Van Parys (BPV) for their excellent contributions and important perspectives on sparse regression. Due to space constraints, and my greater familiarity with the content and context of HTT (I have had numerous fruitful discussions with the authors regarding their work), I will focus my discussion on the HTT paper. HTT nicely articulate the relative merits of three canonical estimators in sparse regression: L0, L1 and (forward)stepwise selection. I am humbled that a premise of their work is an article I wrote with Bertsimas and King [4] (BKM). BKM showed that current Mixed Integer Optimization (MIO) algorithms allow us to compute best subsets solutions for problem instances (p ≈ 1000 features) much larger than a previous benchmark (software for best subsets in the R package leaps) that could only handle instances with p ≈ 30. HTT by extending and refining the experiments performed by BKM, have helped clarify and deepen our understanding of L0, L1 and stepwise regression. They raise several intriguing questions that perhaps deserve further attention from the wider statistics and optimization communities. In this commentary, I will focus on some of the key points discussed in HTT, with a bias toward some of the recent work I have been involved in. There is a large and rich body of work in high-dimensional statistics and related optimization techniques that I will not be able to discuss within the limited scope of my commentary.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"35 1","pages":"602-608"},"PeriodicalIF":5.7,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47846338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}