加倍稳健、计算高效的高维变量选择

Abhinav Chakraborty, Jeffrey Zhang, Eugene Katsevich
{"title":"加倍稳健、计算高效的高维变量选择","authors":"Abhinav Chakraborty, Jeffrey Zhang, Eugene Katsevich","doi":"arxiv-2409.09512","DOIUrl":null,"url":null,"abstract":"The variable selection problem is to discover which of a large set of\npredictors is associated with an outcome of interest, conditionally on the\nother predictors. This problem has been widely studied, but existing approaches\nlack either power against complex alternatives, robustness to model\nmisspecification, computational efficiency, or quantification of evidence\nagainst individual hypotheses. We present tower PCM (tPCM), a statistically and\ncomputationally efficient solution to the variable selection problem that does\nnot suffer from these shortcomings. tPCM adapts the best aspects of two\nexisting procedures that are based on similar functionals: the holdout\nrandomization test (HRT) and the projected covariance measure (PCM). The former\nis a model-X test that utilizes many resamples and few machine learning fits,\nwhile the latter is an asymptotic doubly-robust style test for a single\nhypothesis that requires no resamples and many machine learning fits.\nTheoretically, we demonstrate the validity of tPCM, and perhaps surprisingly,\nthe asymptotic equivalence of HRT, PCM, and tPCM. In so doing, we clarify the\nrelationship between two methods from two separate literatures. An extensive\nsimulation study verifies that tPCM can have significant computational savings\ncompared to HRT and PCM, while maintaining nearly identical power.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"21 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Doubly robust and computationally efficient high-dimensional variable selection\",\"authors\":\"Abhinav Chakraborty, Jeffrey Zhang, Eugene Katsevich\",\"doi\":\"arxiv-2409.09512\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The variable selection problem is to discover which of a large set of\\npredictors is associated with an outcome of interest, conditionally on the\\nother predictors. This problem has been widely studied, but existing approaches\\nlack either power against complex alternatives, robustness to model\\nmisspecification, computational efficiency, or quantification of evidence\\nagainst individual hypotheses. We present tower PCM (tPCM), a statistically and\\ncomputationally efficient solution to the variable selection problem that does\\nnot suffer from these shortcomings. tPCM adapts the best aspects of two\\nexisting procedures that are based on similar functionals: the holdout\\nrandomization test (HRT) and the projected covariance measure (PCM). The former\\nis a model-X test that utilizes many resamples and few machine learning fits,\\nwhile the latter is an asymptotic doubly-robust style test for a single\\nhypothesis that requires no resamples and many machine learning fits.\\nTheoretically, we demonstrate the validity of tPCM, and perhaps surprisingly,\\nthe asymptotic equivalence of HRT, PCM, and tPCM. In so doing, we clarify the\\nrelationship between two methods from two separate literatures. An extensive\\nsimulation study verifies that tPCM can have significant computational savings\\ncompared to HRT and PCM, while maintaining nearly identical power.\",\"PeriodicalId\":501425,\"journal\":{\"name\":\"arXiv - STAT - Methodology\",\"volume\":\"21 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09512\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09512","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

变量选择问题是发现在一大组预测因子中,哪一个与感兴趣的结果相关,并以其他预测因子为条件。这个问题已被广泛研究,但现有的方法在应对复杂替代方案的能力、对模型误设的鲁棒性、计算效率或对个别假设的证据量化方面都存在不足。我们提出了塔式 PCM(tPCM),它是变量选择问题的一种统计和计算高效的解决方案,而且不存在这些缺陷。tPCM 采用了两种基于类似函数的现有程序的最佳方面:保持随机化检验(HRT)和预测协方差测量(PCM)。前者是一种模型 X 检验,需要使用大量重样本和少量机器学习拟合,而后者是一种针对单一假设的渐进双稳健式检验,不需要重样本和大量机器学习拟合。从理论上讲,我们证明了 tPCM 的有效性,而且令人惊讶的是,HRT、PCM 和 tPCM 在渐进上是等价的。在此过程中,我们澄清了来自两个不同文献的两种方法之间的关系。一项广泛的仿真研究证实,与 HRT 和 PCM 相比,tPCM 可以显著节省计算量,同时保持几乎相同的功率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Doubly robust and computationally efficient high-dimensional variable selection
The variable selection problem is to discover which of a large set of predictors is associated with an outcome of interest, conditionally on the other predictors. This problem has been widely studied, but existing approaches lack either power against complex alternatives, robustness to model misspecification, computational efficiency, or quantification of evidence against individual hypotheses. We present tower PCM (tPCM), a statistically and computationally efficient solution to the variable selection problem that does not suffer from these shortcomings. tPCM adapts the best aspects of two existing procedures that are based on similar functionals: the holdout randomization test (HRT) and the projected covariance measure (PCM). The former is a model-X test that utilizes many resamples and few machine learning fits, while the latter is an asymptotic doubly-robust style test for a single hypothesis that requires no resamples and many machine learning fits. Theoretically, we demonstrate the validity of tPCM, and perhaps surprisingly, the asymptotic equivalence of HRT, PCM, and tPCM. In so doing, we clarify the relationship between two methods from two separate literatures. An extensive simulation study verifies that tPCM can have significant computational savings compared to HRT and PCM, while maintaining nearly identical power.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Poisson approximate likelihood compared to the particle filter Optimising the Trade-Off Between Type I and Type II Errors: A Review and Extensions Bias Reduction in Matched Observational Studies with Continuous Treatments: Calipered Non-Bipartite Matching and Bias-Corrected Estimation and Inference Forecasting age distribution of life-table death counts via α-transformation Probability-scale residuals for event-time data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1