Scalable Empirical Bayes Inference and Bayesian Sensitivity Analysis.

IF 3.9 1区 数学 Q1 STATISTICS & PROBABILITY Statistical Science Pub Date : 2024-11-01 Epub Date: 2024-10-30 DOI:10.1214/24-sts936
Hani Doss, Antonio Linero
{"title":"Scalable Empirical Bayes Inference and Bayesian Sensitivity Analysis.","authors":"Hani Doss, Antonio Linero","doi":"10.1214/24-sts936","DOIUrl":null,"url":null,"abstract":"<p><p>Consider a Bayesian setup in which we observe <math><mi>Y</mi></math> , whose distribution depends on a parameter <math><mi>θ</mi></math> , that is, <math><mi>Y</mi> <mo>∣</mo> <mi>θ</mi> <mspace></mspace> <mo>~</mo> <mspace></mspace> <msub><mrow><mi>π</mi></mrow> <mrow><mi>Y</mi> <mo>∣</mo> <mi>θ</mi></mrow> </msub> </math> . The parameter <math><mi>θ</mi></math> is unknown and treated as random, and a prior distribution chosen from some parametric family <math> <mfenced> <mrow> <msub><mrow><mi>π</mi></mrow> <mrow><mi>θ</mi></mrow> </msub> <mo>(</mo> <mo>⋅</mo> <mo>;</mo> <mi>h</mi> <mo>)</mo> <mo>,</mo> <mi>h</mi> <mo>∈</mo> <mi>ℋ</mi></mrow> </mfenced> </math> , is to be placed on it. For the subjective Bayesian there is a single prior in the family which represents his or her beliefs about <math><mi>θ</mi></math> , but determination of this prior is very often extremely difficult. In the empirical Bayes approach, the latent distribution on <math><mi>θ</mi></math> is estimated from the data. This is usually done by choosing the value of the hyperparameter <math><mi>h</mi></math> that maximizes some criterion. Arguably the most common way of doing this is to let <math><mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> be the marginal likelihood of <math><mi>h</mi></math> , that is, <math><mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo> <mo>=</mo> <mo>∫</mo> <msub><mrow><mi>π</mi></mrow> <mrow><mi>Y</mi> <mspace></mspace> <mo>∣</mo> <mspace></mspace> <mi>θ</mi></mrow> </msub> <msub><mrow><mi>v</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mo>(</mo> <mi>θ</mi> <mo>)</mo> <mspace></mspace> <mi>d</mi> <mi>θ</mi></math> , and choose the value of <math><mi>h</mi></math> that maximizes <math><mi>m</mi> <mo>(</mo> <mo>⋅</mo> <mo>)</mo></math> . Unfortunately, except for a handful of textbook examples, analytic evaluation of <math><mi>a</mi> <mi>r</mi> <mi>g</mi> <mspace></mspace> <msub><mrow><mi>m</mi> <mi>a</mi> <mi>x</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mspace></mspace> <mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> is not feasible. The purpose of this paper is two-fold. First, we review the literature on estimating it and find that the most commonly used procedures are either potentially highly inaccurate or don't scale well with the dimension of <math><mi>h</mi></math> , the dimension of <math><mi>θ</mi></math> , or both. Second, we present a method for estimating <math><mi>a</mi> <mi>r</mi> <mi>g</mi> <mspace></mspace> <msub><mrow><mi>m</mi> <mi>a</mi> <mi>x</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mspace></mspace> <mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> , based on Markov chain Monte Carlo, that applies very generally and scales well with dimension. Let <math><mi>g</mi></math> be a real-valued function of <math><mi>θ</mi></math> , and let <math><mi>I</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> be the posterior expectation of <math><mi>g</mi> <mo>(</mo> <mi>θ</mi> <mo>)</mo></math> when the prior is <math> <msub><mrow><mi>v</mi></mrow> <mrow><mi>h</mi></mrow> </msub> </math> . As a byproduct of our approach, we show how to obtain point estimates and globally-valid confidence bands for the family <math><mi>I</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> , <math><mi>h</mi> <mo>∈</mo> <mi>ℋ</mi></math> . To illustrate the scope of our methodology we provide three detailed examples, having different characters.</p>","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"39 4","pages":"601-622"},"PeriodicalIF":3.9000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11654829/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Science","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/24-sts936","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/30 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

Consider a Bayesian setup in which we observe Y , whose distribution depends on a parameter θ , that is, Y θ ~ π Y θ . The parameter θ is unknown and treated as random, and a prior distribution chosen from some parametric family π θ ( ; h ) , h , is to be placed on it. For the subjective Bayesian there is a single prior in the family which represents his or her beliefs about θ , but determination of this prior is very often extremely difficult. In the empirical Bayes approach, the latent distribution on θ is estimated from the data. This is usually done by choosing the value of the hyperparameter h that maximizes some criterion. Arguably the most common way of doing this is to let m ( h ) be the marginal likelihood of h , that is, m ( h ) = π Y θ v h ( θ ) d θ , and choose the value of h that maximizes m ( ) . Unfortunately, except for a handful of textbook examples, analytic evaluation of a r g m a x h m ( h ) is not feasible. The purpose of this paper is two-fold. First, we review the literature on estimating it and find that the most commonly used procedures are either potentially highly inaccurate or don't scale well with the dimension of h , the dimension of θ , or both. Second, we present a method for estimating a r g m a x h m ( h ) , based on Markov chain Monte Carlo, that applies very generally and scales well with dimension. Let g be a real-valued function of θ , and let I ( h ) be the posterior expectation of g ( θ ) when the prior is v h . As a byproduct of our approach, we show how to obtain point estimates and globally-valid confidence bands for the family I ( h ) , h . To illustrate the scope of our methodology we provide three detailed examples, having different characters.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
可扩展经验贝叶斯推理与贝叶斯敏感性分析。
考虑一个贝叶斯设置,其中我们观察到Y,其分布依赖于参数θ,即Y∣θ ~ π Y∣θ。参数θ是未知的,被视为随机的,从某个参数族中选择一个先验分布π θ(⋅;H), H∈H。对于主观贝叶斯来说,家庭中有一个单一的先验,代表他或她对θ的信念,但确定这个先验通常是非常困难的。在经验贝叶斯方法中,从数据中估计θ上的潜在分布。这通常通过选择使某些准则最大化的超参数h的值来完成。可以说,最常用的方法是设m (h)为h的边际似然,即m (h) =∫π Y∣θ v h (θ) d θ,并选择使m(⋅)最大化的h值。不幸的是,除了少数教科书上的例子外,对一个r g m a x h m (h)的解析评价是不可用的。本文的目的是双重的。首先,我们回顾了关于估计它的文献,发现最常用的程序要么可能非常不准确,要么不能很好地与h的维度、θ的维度或两者相适应。其次,我们提出了一种基于马尔可夫链蒙特卡罗的估计r g ma x h m (h)的方法,该方法非常普遍,并且随维度的变化而变化。设g为θ的实值函数,设I (h)为g (θ)的后验期望,当先验为v h时。作为我们方法的副产品,我们展示了如何获得族I (h), h∈h的点估计和全局有效的置信带。为了说明我们的方法的范围,我们提供了三个具有不同特征的详细示例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Statistical Science
Statistical Science 数学-统计学与概率论
CiteScore
6.50
自引率
1.80%
发文量
40
审稿时长
>12 weeks
期刊介绍: The central purpose of Statistical Science is to convey the richness, breadth and unity of the field by presenting the full range of contemporary statistical thought at a moderate technical level, accessible to the wide community of practitioners, researchers and students of statistics and probability.
期刊最新文献
Scalable Empirical Bayes Inference and Bayesian Sensitivity Analysis. Variable Selection Using Bayesian Additive Regression Trees. Defining Replicability of Prediction Rules Tracking Truth Through Measurement and the Spyglass of Statistics Replicability Across Multiple Studies
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1