Reproducible Model Selection Using Bagged Posteriors.

IF 4.9 2区 数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Bayesian Analysis Pub Date : 2023-03-01 DOI:10.1214/21-ba1301
Jonathan H Huggins, Jeffrey W Miller
{"title":"Reproducible Model Selection Using Bagged Posteriors.","authors":"Jonathan H Huggins,&nbsp;Jeffrey W Miller","doi":"10.1214/21-ba1301","DOIUrl":null,"url":null,"abstract":"<p><p>Bayesian model selection is premised on the assumption that the data are generated from one of the postulated models. However, in many applications, all of these models are incorrect (that is, there is misspecification). When the models are misspecified, two or more models can provide a nearly equally good fit to the data, in which case Bayesian model selection can be highly unstable, potentially leading to self-contradictory findings. To remedy this instability, we propose to use bagging on the posterior distribution (\"BayesBag\") - that is, to average the posterior model probabilities over many bootstrapped datasets. We provide theoretical results characterizing the asymptotic behavior of the posterior and the bagged posterior in the (misspecified) model selection setting. We empirically assess the BayesBag approach on synthetic and real-world data in (i) feature selection for linear regression and (ii) phylogenetic tree reconstruction. Our theory and experiments show that, when all models are misspecified, BayesBag (a) provides greater reproducibility and (b) places posterior mass on optimal models more reliably, compared to the usual Bayesian posterior; on the other hand, under correct specification, BayesBag is slightly more conservative than the usual posterior, in the sense that BayesBag posterior probabilities tend to be slightly farther from the extremes of zero and one. Overall, our results demonstrate that BayesBag provides an easy-to-use and widely applicable approach that improves upon Bayesian model selection by making it more stable and reproducible.</p>","PeriodicalId":55398,"journal":{"name":"Bayesian Analysis","volume":null,"pages":null},"PeriodicalIF":4.9000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9838736/pdf/nihms-1796997.pdf","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bayesian Analysis","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/21-ba1301","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 10

Abstract

Bayesian model selection is premised on the assumption that the data are generated from one of the postulated models. However, in many applications, all of these models are incorrect (that is, there is misspecification). When the models are misspecified, two or more models can provide a nearly equally good fit to the data, in which case Bayesian model selection can be highly unstable, potentially leading to self-contradictory findings. To remedy this instability, we propose to use bagging on the posterior distribution ("BayesBag") - that is, to average the posterior model probabilities over many bootstrapped datasets. We provide theoretical results characterizing the asymptotic behavior of the posterior and the bagged posterior in the (misspecified) model selection setting. We empirically assess the BayesBag approach on synthetic and real-world data in (i) feature selection for linear regression and (ii) phylogenetic tree reconstruction. Our theory and experiments show that, when all models are misspecified, BayesBag (a) provides greater reproducibility and (b) places posterior mass on optimal models more reliably, compared to the usual Bayesian posterior; on the other hand, under correct specification, BayesBag is slightly more conservative than the usual posterior, in the sense that BayesBag posterior probabilities tend to be slightly farther from the extremes of zero and one. Overall, our results demonstrate that BayesBag provides an easy-to-use and widely applicable approach that improves upon Bayesian model selection by making it more stable and reproducible.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用袋装后验进行可重复性模型选择。
贝叶斯模型选择的前提是假设数据是从假设模型之一产生的。然而,在许多应用程序中,所有这些模型都是不正确的(也就是说,存在错误的规范)。当模型被错误指定时,两个或两个以上的模型可以提供几乎相同的数据拟合,在这种情况下,贝叶斯模型选择可能非常不稳定,可能导致自相矛盾的结果。为了弥补这种不稳定性,我们建议对后验分布使用bagging(“BayesBag”)——也就是说,对许多自举数据集的后验模型概率进行平均。我们提供了理论结果表征后验和袋装后验的渐近行为在(错误指定的)模型选择设置。我们在以下方面对BayesBag方法进行了实证评估:(i)线性回归的特征选择和(ii)系统发育树重建。我们的理论和实验表明,与通常的贝叶斯后验相比,当所有模型都被错误指定时,BayesBag (a)提供了更大的再现性,(b)更可靠地将后验质量放在最优模型上;另一方面,在正确的规范下,BayesBag比通常的后验概率略保守,也就是说BayesBag后验概率往往离零和一的极值略远。总的来说,我们的结果表明BayesBag提供了一种易于使用且广泛适用的方法,通过使贝叶斯模型选择更加稳定和可重复性来改进贝叶斯模型选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Bayesian Analysis
Bayesian Analysis 数学-数学跨学科应用
CiteScore
6.50
自引率
13.60%
发文量
59
审稿时长
>12 weeks
期刊介绍: Bayesian Analysis is an electronic journal of the International Society for Bayesian Analysis. It seeks to publish a wide range of articles that demonstrate or discuss Bayesian methods in some theoretical or applied context. The journal welcomes submissions involving presentation of new computational and statistical methods; critical reviews and discussions of existing approaches; historical perspectives; description of important scientific or policy application areas; case studies; and methods for experimental design, data collection, data sharing, or data mining. Evaluation of submissions is based on importance of content and effectiveness of communication. Discussion papers are typically chosen by the Editor in Chief, or suggested by an Editor, among the regular submissions. In addition, the Journal encourages individual authors to submit manuscripts for consideration as discussion papers.
期刊最新文献
How Trustworthy Is Your Tree? Bayesian Phylogenetic Effective Sample Size Through the Lens of Monte Carlo Error. A General Bayesian Functional Spatial Partitioning Method for Multiple Region Discovery Applied to Prostate Cancer MRI. Posterior Shrinkage Towards Linear Subspaces Dynamic Functional Variable Selection for Multimodal mHealth Data Heavy-Tailed NGG-Mixture Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1