Scalable Empirical Bayes Inference and Bayesian Sensitivity Analysis.

IF 3.4 1区数学 Q1 STATISTICS & PROBABILITY Statistical Science Pub Date : 2024-11-01 Epub Date: 2024-10-30 DOI:10.1214/24-sts936

Hani Doss, Antonio Linero

{"title":"Scalable Empirical Bayes Inference and Bayesian Sensitivity Analysis.","authors":"Hani Doss, Antonio Linero","doi":"10.1214/24-sts936","DOIUrl":null,"url":null,"abstract":"<p><p>Consider a Bayesian setup in which we observe <math><mi>Y</mi></math> , whose distribution depends on a parameter <math><mi>θ</mi></math> , that is, <math><mi>Y</mi> <mo>∣</mo> <mi>θ</mi> <mspace></mspace> <mo>~</mo> <mspace></mspace> <msub><mrow><mi>π</mi></mrow> <mrow><mi>Y</mi> <mo>∣</mo> <mi>θ</mi></mrow> </msub> </math> . The parameter <math><mi>θ</mi></math> is unknown and treated as random, and a prior distribution chosen from some parametric family <math> <mfenced> <mrow> <msub><mrow><mi>π</mi></mrow> <mrow><mi>θ</mi></mrow> </msub> <mo>(</mo> <mo>⋅</mo> <mo>;</mo> <mi>h</mi> <mo>)</mo> <mo>,</mo> <mi>h</mi> <mo>∈</mo> <mi>ℋ</mi></mrow> </mfenced> </math> , is to be placed on it. For the subjective Bayesian there is a single prior in the family which represents his or her beliefs about <math><mi>θ</mi></math> , but determination of this prior is very often extremely difficult. In the empirical Bayes approach, the latent distribution on <math><mi>θ</mi></math> is estimated from the data. This is usually done by choosing the value of the hyperparameter <math><mi>h</mi></math> that maximizes some criterion. Arguably the most common way of doing this is to let <math><mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> be the marginal likelihood of <math><mi>h</mi></math> , that is, <math><mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo> <mo>=</mo> <mo>∫</mo> <msub><mrow><mi>π</mi></mrow> <mrow><mi>Y</mi> <mspace></mspace> <mo>∣</mo> <mspace></mspace> <mi>θ</mi></mrow> </msub> <msub><mrow><mi>v</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mo>(</mo> <mi>θ</mi> <mo>)</mo> <mspace></mspace> <mi>d</mi> <mi>θ</mi></math> , and choose the value of <math><mi>h</mi></math> that maximizes <math><mi>m</mi> <mo>(</mo> <mo>⋅</mo> <mo>)</mo></math> . Unfortunately, except for a handful of textbook examples, analytic evaluation of <math><mi>a</mi> <mi>r</mi> <mi>g</mi> <mspace></mspace> <msub><mrow><mi>m</mi> <mi>a</mi> <mi>x</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mspace></mspace> <mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> is not feasible. The purpose of this paper is two-fold. First, we review the literature on estimating it and find that the most commonly used procedures are either potentially highly inaccurate or don't scale well with the dimension of <math><mi>h</mi></math> , the dimension of <math><mi>θ</mi></math> , or both. Second, we present a method for estimating <math><mi>a</mi> <mi>r</mi> <mi>g</mi> <mspace></mspace> <msub><mrow><mi>m</mi> <mi>a</mi> <mi>x</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mspace></mspace> <mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> , based on Markov chain Monte Carlo, that applies very generally and scales well with dimension. Let <math><mi>g</mi></math> be a real-valued function of <math><mi>θ</mi></math> , and let <math><mi>I</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> be the posterior expectation of <math><mi>g</mi> <mo>(</mo> <mi>θ</mi> <mo>)</mo></math> when the prior is <math> <msub><mrow><mi>v</mi></mrow> <mrow><mi>h</mi></mrow> </msub> </math> . As a byproduct of our approach, we show how to obtain point estimates and globally-valid confidence bands for the family <math><mi>I</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> , <math><mi>h</mi> <mo>∈</mo> <mi>ℋ</mi></math> . To illustrate the scope of our methodology we provide three detailed examples, having different characters.</p>","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"39 4","pages":"601-622"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11654829/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Science","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/24-sts936","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/30 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

Consider a Bayesian setup in which we observe $Y$ , whose distribution depends on a parameter $θ$ , that is, $Y ∣ θ ~ π_{Y ∣ θ}$ . The parameter $θ$ is unknown and treated as random, and a prior distribution chosen from some parametric family $(π_{θ} (\cdot; h), h \in ℋ)$ , is to be placed on it. For the subjective Bayesian there is a single prior in the family which represents his or her beliefs about $θ$ , but determination of this prior is very often extremely difficult. In the empirical Bayes approach, the latent distribution on $θ$ is estimated from the data. This is usually done by choosing the value of the hyperparameter $h$ that maximizes some criterion. Arguably the most common way of doing this is to let $m (h)$ be the marginal likelihood of $h$ , that is, $m (h) = \int π_{Y ∣ θ} v_{h} (θ) d θ$ , and choose the value of $h$ that maximizes $m (\cdot)$ . Unfortunately, except for a handful of textbook examples, analytic evaluation of $a r g {m a x}_{h} m (h)$ is not feasible. The purpose of this paper is two-fold. First, we review the literature on estimating it and find that the most commonly used procedures are either potentially highly inaccurate or don't scale well with the dimension of $h$ , the dimension of $θ$ , or both. Second, we present a method for estimating $a r g {m a x}_{h} m (h)$ , based on Markov chain Monte Carlo, that applies very generally and scales well with dimension. Let $g$ be a real-valued function of $θ$ , and let $I (h)$ be the posterior expectation of $g (θ)$ when the prior is $v_{h}$ . As a byproduct of our approach, we show how to obtain point estimates and globally-valid confidence bands for the family $I (h)$ , $h \in ℋ$ . To illustrate the scope of our methodology we provide three detailed examples, having different characters.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

可扩展经验贝叶斯推理与贝叶斯敏感性分析。

考虑一个贝叶斯设置，其中我们观察到Y，其分布依赖于参数θ，即Y∣θ ~ π Y∣θ。参数θ是未知的，被视为随机的，从某个参数族中选择一个先验分布π θ(⋅；H), H∈H。对于主观贝叶斯来说，家庭中有一个单一的先验，代表他或她对θ的信念，但确定这个先验通常是非常困难的。在经验贝叶斯方法中，从数据中估计θ上的潜在分布。这通常通过选择使某些准则最大化的超参数h的值来完成。可以说，最常用的方法是设m (h)为h的边际似然，即m (h) =∫π Y∣θ v h (θ) d θ，并选择使m（⋅）最大化的h值。不幸的是，除了少数教科书上的例子外，对一个r g m a x h m (h)的解析评价是不可用的。本文的目的是双重的。首先，我们回顾了关于估计它的文献，发现最常用的程序要么可能非常不准确，要么不能很好地与h的维度、θ的维度或两者相适应。其次，我们提出了一种基于马尔可夫链蒙特卡罗的估计r g ma x h m (h)的方法，该方法非常普遍，并且随维度的变化而变化。设g为θ的实值函数，设I (h)为g （θ）的后验期望，当先验为v h时。作为我们方法的副产品，我们展示了如何获得族I (h)， h∈h的点估计和全局有效的置信带。为了说明我们的方法的范围，我们提供了三个具有不同特征的详细示例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Statistical Science 数学-统计学与概率论

CiteScore

6.50

自引率

1.80%

发文量

审稿时长

>12 weeks

期刊介绍： The central purpose of Statistical Science is to convey the richness, breadth and unity of the field by presenting the full range of contemporary statistical thought at a moderate technical level, accessible to the wide community of practitioners, researchers and students of statistics and probability.

期刊最新文献

On the mixed-model analysis of covariance in cluster-randomized trials. Replicable Bandits for Digital Health Interventions. On the Use of Auxiliary Variables in Multilevel Regression and Poststratification. Scalable Empirical Bayes Inference and Bayesian Sensitivity Analysis. Variable Selection Using Bayesian Additive Regression Trees.