Reproducible Model Selection Using Bagged Posteriors.

IF 2.5 2区数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Bayesian Analysis Pub Date : 2023-03-01 DOI:10.1214/21-ba1301

Jonathan H Huggins, Jeffrey W Miller

{"title":"Reproducible Model Selection Using Bagged Posteriors.","authors":"Jonathan H Huggins, Jeffrey W Miller","doi":"10.1214/21-ba1301","DOIUrl":null,"url":null,"abstract":"<p><p>Bayesian model selection is premised on the assumption that the data are generated from one of the postulated models. However, in many applications, all of these models are incorrect (that is, there is misspecification). When the models are misspecified, two or more models can provide a nearly equally good fit to the data, in which case Bayesian model selection can be highly unstable, potentially leading to self-contradictory findings. To remedy this instability, we propose to use bagging on the posterior distribution (\"BayesBag\") - that is, to average the posterior model probabilities over many bootstrapped datasets. We provide theoretical results characterizing the asymptotic behavior of the posterior and the bagged posterior in the (misspecified) model selection setting. We empirically assess the BayesBag approach on synthetic and real-world data in (i) feature selection for linear regression and (ii) phylogenetic tree reconstruction. Our theory and experiments show that, when all models are misspecified, BayesBag (a) provides greater reproducibility and (b) places posterior mass on optimal models more reliably, compared to the usual Bayesian posterior; on the other hand, under correct specification, BayesBag is slightly more conservative than the usual posterior, in the sense that BayesBag posterior probabilities tend to be slightly farther from the extremes of zero and one. Overall, our results demonstrate that BayesBag provides an easy-to-use and widely applicable approach that improves upon Bayesian model selection by making it more stable and reproducible.</p>","PeriodicalId":55398,"journal":{"name":"Bayesian Analysis","volume":"18 1","pages":"79-104"},"PeriodicalIF":2.5000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9838736/pdf/nihms-1796997.pdf","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bayesian Analysis","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/21-ba1301","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 10

Abstract

Bayesian model selection is premised on the assumption that the data are generated from one of the postulated models. However, in many applications, all of these models are incorrect (that is, there is misspecification). When the models are misspecified, two or more models can provide a nearly equally good fit to the data, in which case Bayesian model selection can be highly unstable, potentially leading to self-contradictory findings. To remedy this instability, we propose to use bagging on the posterior distribution ("BayesBag") - that is, to average the posterior model probabilities over many bootstrapped datasets. We provide theoretical results characterizing the asymptotic behavior of the posterior and the bagged posterior in the (misspecified) model selection setting. We empirically assess the BayesBag approach on synthetic and real-world data in (i) feature selection for linear regression and (ii) phylogenetic tree reconstruction. Our theory and experiments show that, when all models are misspecified, BayesBag (a) provides greater reproducibility and (b) places posterior mass on optimal models more reliably, compared to the usual Bayesian posterior; on the other hand, under correct specification, BayesBag is slightly more conservative than the usual posterior, in the sense that BayesBag posterior probabilities tend to be slightly farther from the extremes of zero and one. Overall, our results demonstrate that BayesBag provides an easy-to-use and widely applicable approach that improves upon Bayesian model selection by making it more stable and reproducible.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用袋装后验进行可重复性模型选择。

贝叶斯模型选择的前提是假设数据是从假设模型之一产生的。然而，在许多应用程序中，所有这些模型都是不正确的(也就是说，存在错误的规范)。当模型被错误指定时，两个或两个以上的模型可以提供几乎相同的数据拟合，在这种情况下，贝叶斯模型选择可能非常不稳定，可能导致自相矛盾的结果。为了弥补这种不稳定性，我们建议对后验分布使用bagging(“BayesBag”)——也就是说，对许多自举数据集的后验模型概率进行平均。我们提供了理论结果表征后验和袋装后验的渐近行为在(错误指定的)模型选择设置。我们在以下方面对BayesBag方法进行了实证评估:(i)线性回归的特征选择和(ii)系统发育树重建。我们的理论和实验表明，与通常的贝叶斯后验相比，当所有模型都被错误指定时，BayesBag (a)提供了更大的再现性，(b)更可靠地将后验质量放在最优模型上;另一方面，在正确的规范下，BayesBag比通常的后验概率略保守，也就是说BayesBag后验概率往往离零和一的极值略远。总的来说，我们的结果表明BayesBag提供了一种易于使用且广泛适用的方法，通过使贝叶斯模型选择更加稳定和可重复性来改进贝叶斯模型选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Bayesian Analysis 数学-数学跨学科应用

CiteScore

6.50

自引率

13.60%

发文量

审稿时长

>12 weeks

期刊介绍： Bayesian Analysis is an electronic journal of the International Society for Bayesian Analysis. It seeks to publish a wide range of articles that demonstrate or discuss Bayesian methods in some theoretical or applied context. The journal welcomes submissions involving presentation of new computational and statistical methods; critical reviews and discussions of existing approaches; historical perspectives; description of important scientific or policy application areas; case studies; and methods for experimental design, data collection, data sharing, or data mining. Evaluation of submissions is based on importance of content and effectiveness of communication. Discussion papers are typically chosen by the Editor in Chief, or suggested by an Editor, among the regular submissions. In addition, the Journal encourages individual authors to submit manuscripts for consideration as discussion papers.