Valid post-selection inference in model-free linear regression

IF 3.2 1区 数学 Q1 STATISTICS & PROBABILITY Annals of Statistics Pub Date : 2020-10-01 DOI:10.1214/19-AOS1917
Arun K. Kuchibhotla, L. Brown, A. Buja, Junhui Cai, E. George, Linda H. Zhao
{"title":"Valid post-selection inference in model-free linear regression","authors":"Arun K. Kuchibhotla, L. Brown, A. Buja, Junhui Cai, E. George, Linda H. Zhao","doi":"10.1214/19-AOS1917","DOIUrl":null,"url":null,"abstract":"S.1. Simulations Continued. The simulation setting in this section is the same as in Section 9. We first describe the reason for using the null situation β0 0p in the model. If β0 is an arbitrary non-zero vector, then, for fixed covariates, XiYi cannot be identically distributed and hence only (asymptotically) conservative inference is possible. In simulations this conservativeness confounds with the simultaneity so that the coverage becomes close to 1 (if not 1). In the main manuscript, we have shown plots comparing our method with Berk et al. (2013) and selective inference. We label our confidence region R̂:n,M (12) as “UPoSI,” the projected confidence region B̂ n,M (28) as “UPoSIBox”, and Berk et al. (2013) as “PoSI.” Tables 1, 2, and 3 show exact numbers for the comparison of our method with Berk et al. (2013). Note that size of each dot in the row plot of Figure 9 indicates the proportion of confidence regions of that volume among same-sized models. In Setting A and B, the confidence region volumes of same-sized models are the same. In Setting C, volumes of confidence regions of Berk and PoSI Box enlarge (hence smaller logpVolq{|M |q if the last covariate is included. Tables 4 and 5 show the numbers for the comparison of our method with selective inference when the selection procedure is forward stepwise and LARS, respectively. Sample splitting is a simple procedure that provides valid inference after selection as discussed in Section 1.3. We stress here that this is valid only for independent observations and that the model selected in the first split half could be different from the one selected in the full data. The comparison results with n 1000, p 500 and selection methods forward stepwise, LARS and BIC are summarized in Figure S.1. For sample splitting we have used the Bonferroni correction to obtain simultaneous inference for all coefficients in a model. Table 6 shows the comparison of our method with sample splitting.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":3.2000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/19-AOS1917","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 20

Abstract

S.1. Simulations Continued. The simulation setting in this section is the same as in Section 9. We first describe the reason for using the null situation β0 0p in the model. If β0 is an arbitrary non-zero vector, then, for fixed covariates, XiYi cannot be identically distributed and hence only (asymptotically) conservative inference is possible. In simulations this conservativeness confounds with the simultaneity so that the coverage becomes close to 1 (if not 1). In the main manuscript, we have shown plots comparing our method with Berk et al. (2013) and selective inference. We label our confidence region R̂:n,M (12) as “UPoSI,” the projected confidence region B̂ n,M (28) as “UPoSIBox”, and Berk et al. (2013) as “PoSI.” Tables 1, 2, and 3 show exact numbers for the comparison of our method with Berk et al. (2013). Note that size of each dot in the row plot of Figure 9 indicates the proportion of confidence regions of that volume among same-sized models. In Setting A and B, the confidence region volumes of same-sized models are the same. In Setting C, volumes of confidence regions of Berk and PoSI Box enlarge (hence smaller logpVolq{|M |q if the last covariate is included. Tables 4 and 5 show the numbers for the comparison of our method with selective inference when the selection procedure is forward stepwise and LARS, respectively. Sample splitting is a simple procedure that provides valid inference after selection as discussed in Section 1.3. We stress here that this is valid only for independent observations and that the model selected in the first split half could be different from the one selected in the full data. The comparison results with n 1000, p 500 and selection methods forward stepwise, LARS and BIC are summarized in Figure S.1. For sample splitting we have used the Bonferroni correction to obtain simultaneous inference for all coefficients in a model. Table 6 shows the comparison of our method with sample splitting.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
无模型线性回归中的有效后选择推理
S.1。模拟继续说。本节中的模拟设置与第9节中的相同。我们首先描述了在模型中使用零情况β0 0p的原因。如果β0是任意非零向量,则对于固定的协变量,XiYi不可能是同分布的,因此只能(渐近)保守推断。在模拟中,这种保守性与同时性相混淆,使覆盖率接近1(如果不是1)。在主要手稿中,我们展示了将我们的方法与Berk等人(2013)和选择性推断进行比较的图表。我们将我们的置信区域R n,M(12)标记为“UPoSI”,将预测的置信区域B n,M(28)标记为“UPoSIBox”,并将Berk et al.(2013)标记为“PoSI”。表1、2和3显示了我们的方法与Berk et al.(2013)比较的确切数字。注意,图9的行图中每个点的大小表示该体积在相同大小的模型中置信区域的比例。在设置A和B中,相同大小模型的置信区域体积相同。在设置C中,如果包括最后一个协变量,则Berk和PoSI Box的置信区域的体积增大(因此更小的logpVolq{|M |q)。表4和表5分别显示了当选择过程为逐步前向和LARS时,我们的方法与选择性推理的比较数字。样本分割是一个简单的过程,在选择后提供有效的推理,如1.3节所讨论的。我们在这里强调,这只对独立的观测有效,并且在第一个分割部分中选择的模型可能不同于在完整数据中选择的模型。与n 1000, p 500和逐步选择方法,LARS和BIC的比较结果总结在图S.1中。对于样本分割,我们使用Bonferroni校正来获得模型中所有系数的同时推断。表6显示了我们的方法与样本分割的比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Annals of Statistics
Annals of Statistics 数学-统计学与概率论
CiteScore
9.30
自引率
8.90%
发文量
119
审稿时长
6-12 weeks
期刊介绍: The Annals of Statistics aim to publish research papers of highest quality reflecting the many facets of contemporary statistics. Primary emphasis is placed on importance and originality, not on formalism. The journal aims to cover all areas of statistics, especially mathematical statistics and applied & interdisciplinary statistics. Of course many of the best papers will touch on more than one of these general areas, because the discipline of statistics has deep roots in mathematics, and in substantive scientific fields.
期刊最新文献
ON BLOCKWISE AND REFERENCE PANEL-BASED ESTIMATORS FOR GENETIC DATA PREDICTION IN HIGH DIMENSIONS. RANK-BASED INDICES FOR TESTING INDEPENDENCE BETWEEN TWO HIGH-DIMENSIONAL VECTORS. Single index Fréchet regression Graphical models for nonstationary time series On lower bounds for the bias-variance trade-off
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1