线性回归模型中的后模型选择推理:综合综述

IF 11 Q1 STATISTICS & PROBABILITY Statistics Surveys Pub Date : 2022-01-01 DOI:10.1214/22-ss135

Dongliang Zhang, Abbas Khalili, M. Asgharian

{"title":"线性回归模型中的后模型选择推理:综合综述","authors":"Dongliang Zhang, Abbas Khalili, M. Asgharian","doi":"10.1214/22-ss135","DOIUrl":null,"url":null,"abstract":"The research on statistical inference after data-driven model selection can be traced as far back as Koopmans (1949). The intensive research on modern model selection methods for high-dimensional data over the past three decades revived the interest in statistical inference after model selection. In recent years, there has been a surge of articles on statistical inference after model selection and now a rather vast literature exists on this topic. Our manuscript aims at presenting a holistic review of post-model-selection inference in linear regression models, while also incorporating perspectives from high-dimensional inference in these models. We first give a simulated example motivating the necessity for valid statistical inference after model selection. We then provide theoretical insights explaining the phenomena observed in the example. This is done through a literature survey on the post-selection sampling distribution of regression parameter estimators and properties of coverage probabilities of näıve confidence intervals. Categorized according to two types of estimation targets, namely the populationand projection-based regression coefficients, we present a review of recent uncertainty assessment methods. We also discuss possible pros and cons for the confidence intervals constructed by different methods. MSC2020 subject classifications: Primary 62F25; secondary 62J07.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"1 1","pages":""},"PeriodicalIF":11.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Post-model-selection inference in linear regression models: An integrated review\",\"authors\":\"Dongliang Zhang, Abbas Khalili, M. Asgharian\",\"doi\":\"10.1214/22-ss135\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The research on statistical inference after data-driven model selection can be traced as far back as Koopmans (1949). The intensive research on modern model selection methods for high-dimensional data over the past three decades revived the interest in statistical inference after model selection. In recent years, there has been a surge of articles on statistical inference after model selection and now a rather vast literature exists on this topic. Our manuscript aims at presenting a holistic review of post-model-selection inference in linear regression models, while also incorporating perspectives from high-dimensional inference in these models. We first give a simulated example motivating the necessity for valid statistical inference after model selection. We then provide theoretical insights explaining the phenomena observed in the example. This is done through a literature survey on the post-selection sampling distribution of regression parameter estimators and properties of coverage probabilities of näıve confidence intervals. Categorized according to two types of estimation targets, namely the populationand projection-based regression coefficients, we present a review of recent uncertainty assessment methods. We also discuss possible pros and cons for the confidence intervals constructed by different methods. MSC2020 subject classifications: Primary 62F25; secondary 62J07.\",\"PeriodicalId\":46627,\"journal\":{\"name\":\"Statistics Surveys\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":11.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistics Surveys\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1214/22-ss135\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics Surveys","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1214/22-ss135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 12

摘要

对数据驱动模型选择后的统计推断的研究，最早可以追溯到Koopmans(1949)。近三十年来，对现代高维数据模型选择方法的深入研究，重新唤起了对模型选择后统计推断的兴趣。近年来，关于模型选择后的统计推断的文章激增，目前已有相当多的文献。我们的手稿旨在对线性回归模型中的后模型选择推理进行全面回顾，同时也结合了这些模型中高维推理的观点。我们首先给出一个模拟的例子，说明在模型选择后进行有效统计推断的必要性。然后，我们提供理论见解来解释在示例中观察到的现象。这是通过对回归参数估计器的选择后抽样分布和näıve置信区间的覆盖概率属性的文献调查来完成的。根据两类估计目标，即基于人口的回归系数和基于预测的回归系数，我们对最近的不确定性评估方法进行了综述。我们还讨论了不同方法构造的置信区间可能的优缺点。MSC2020学科分类:Primary 62F25;二次62 j07。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Post-model-selection inference in linear regression models: An integrated review

The research on statistical inference after data-driven model selection can be traced as far back as Koopmans (1949). The intensive research on modern model selection methods for high-dimensional data over the past three decades revived the interest in statistical inference after model selection. In recent years, there has been a surge of articles on statistical inference after model selection and now a rather vast literature exists on this topic. Our manuscript aims at presenting a holistic review of post-model-selection inference in linear regression models, while also incorporating perspectives from high-dimensional inference in these models. We first give a simulated example motivating the necessity for valid statistical inference after model selection. We then provide theoretical insights explaining the phenomena observed in the example. This is done through a literature survey on the post-selection sampling distribution of regression parameter estimators and properties of coverage probabilities of näıve confidence intervals. Categorized according to two types of estimation targets, namely the populationand projection-based regression coefficients, we present a review of recent uncertainty assessment methods. We also discuss possible pros and cons for the confidence intervals constructed by different methods. MSC2020 subject classifications: Primary 62F25; secondary 62J07.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Statistics Surveys STATISTICS & PROBABILITY-

CiteScore

11.70

自引率

0.00%

发文量

期刊介绍： Statistics Surveys publishes survey articles in theoretical, computational, and applied statistics. The style of articles may range from reviews of recent research to graduate textbook exposition. Articles may be broad or narrow in scope. The essential requirements are a well specified topic and target audience, together with clear exposition. Statistics Surveys is sponsored by the American Statistical Association, the Bernoulli Society, the Institute of Mathematical Statistics, and by the Statistical Society of Canada.

期刊最新文献

White noise testing for functional time series Spline local basis methods for nonparametric density estimation Core-periphery structure in networks: A statistical exposition Kronecker-structured covariance models for multiway data A brief and understandable guide to pseudo-random number generators and specific models for security