Bayesian Analysis最新文献

Quantum Speedups for Multiproposal MCMC. 多提案MCMC的量子加速。

IF 2.5 2区数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Bayesian Analysis

Pub Date : 2025-08-18 DOI: 10.1214/25-ba1546

Chin-Yi Lin, Kuo-Chin Chen, Philippe Lemey, Marc A Suchard, Andrew J Holbrook, Min-Hsiu Hsieh

Multiproposal Markov chain Monte Carlo (MCMC) algorithms choose from multiple proposals to generate their next chain step in order to sample from challenging target distributions more efficiently. However, on classical machines, these algorithms require $𝒪 (P)$ target evaluations for each Markov chain step when choosing from $P$ proposals. Recent work demonstrates the possibility of quadratic quantum speedups for one such multiproposal MCMC algorithm. After generating $P$ proposals, this quantum parallel MCMC (QPMCMC) algorithm requires only $𝒪 (\sqrt{P})$ target evaluations at each step, outperforming its classical counterpart. However, generating $P$ proposals using classical computers still requires $𝒪 (P)$ time complexity, resulting in the overall complexity of QPMCMC remaining $𝒪 (P)$ . Here, we present a new, faster quantum multiproposal MCMC strategy, QPMCMC2. With a specially designed Tjelmeland distribution that generates proposals close to the input state, QPMCMC2 requires only $𝒪 (1)$ target evaluations and $𝒪 (log P)$ qubits when computing over a large number of proposals $P$ . Unlike its slower predecessor, the QPMCMC2 Markov kernel (1) maintains detailed balance exactly and (2) is fully explicit for a large class of graphical models. We demonstrate this flexibility by applying QPMCMC2 to novel Ising-type models built on bacterial evolutionary networks and obtain significant speedups for Bayesian ancestral trait reconstruction for 248 observed salmonella bacteria.

多提议马尔可夫链蒙特卡罗（MCMC）算法从多个提议中选择产生下一个链式步骤，以便更有效地从具有挑战性的目标分布中进行采样。然而，在经典机器上，当从P个建议中进行选择时，这些算法需要对每个马尔可夫链步骤进行 (P)目标评估。最近的工作证明了一种多提议MCMC算法的二次量子加速的可能性。在生成P个建议后，该量子并行MCMC （QPMCMC）算法每一步只需要进行 (P)个目标评估，优于经典算法。然而，使用经典计算机生成P个提议仍然需要时间复杂度，这导致QPMCMC的总体复杂度仍然为剩余的时间复杂度。在这里，我们提出了一种新的、更快的量子多提议MCMC策略，QPMCMC2。QPMCMC2采用特殊设计的Tjelmeland分布，生成接近输入状态的提议，在计算大量提议P时，只需要进行(1)个目标评估和 (log P)个量子比特。与它较慢的前身不同，QPMCMC2马尔可夫核(1)精确地保持了详细的平衡，(2)对于大型图形模型是完全显式的。我们通过将QPMCMC2应用于基于细菌进化网络的新型issing型模型，证明了这种灵活性，并获得了248种观察到的沙门氏菌的贝叶斯祖先特征重建的显着速度。

{"title":"Quantum Speedups for Multiproposal MCMC.","authors":"Chin-Yi Lin, Kuo-Chin Chen, Philippe Lemey, Marc A Suchard, Andrew J Holbrook, Min-Hsiu Hsieh","doi":"10.1214/25-ba1546","DOIUrl":"10.1214/25-ba1546","url":null,"abstract":"Multiproposal Markov chain Monte Carlo (MCMC) algorithms choose from multiple proposals to generate their next chain step in order to sample from challenging target distributions more efficiently. However, on classical machines, these algorithms require <math><mi>𝒪</mi> <mo>(</mo> <mi>P</mi> <mo>)</mo></math> target evaluations for each Markov chain step when choosing from <math><mi>P</mi></math> proposals. Recent work demonstrates the possibility of quadratic quantum speedups for one such multiproposal MCMC algorithm. After generating <math><mi>P</mi></math> proposals, this quantum parallel MCMC (QPMCMC) algorithm requires only <math><mi>𝒪</mi> <mo>(</mo> <msqrt><mi>P</mi></msqrt> <mo>)</mo></math> target evaluations at each step, outperforming its classical counterpart. However, generating <math><mi>P</mi></math> proposals using classical computers still requires <math><mi>𝒪</mi> <mo>(</mo> <mi>P</mi> <mo>)</mo></math> time complexity, resulting in the overall complexity of QPMCMC remaining <math><mi>𝒪</mi> <mo>(</mo> <mi>P</mi> <mo>)</mo></math> . Here, we present a new, faster quantum multiproposal MCMC strategy, QPMCMC2. With a specially designed Tjelmeland distribution that generates proposals close to the input state, QPMCMC2 requires only <math><mi>𝒪</mi> <mo>(</mo> <mn>1</mn> <mo>)</mo></math> target evaluations and <math><mi>𝒪</mi> <mo>(</mo> <mtext>log</mtext> <mspace></mspace> <mi>P</mi> <mo>)</mo></math> qubits when computing over a large number of proposals <math><mi>P</mi></math> . Unlike its slower predecessor, the QPMCMC2 Markov kernel (1) maintains detailed balance exactly and (2) is fully explicit for a large class of graphical models. We demonstrate this flexibility by applying QPMCMC2 to novel Ising-type models built on bacterial evolutionary networks and obtain significant speedups for Bayesian ancestral trait reconstruction for 248 observed salmonella bacteria.","PeriodicalId":55398,"journal":{"name":"Bayesian Analysis","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12456418/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145139502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Simulation-Based Calibration Checking for Bayesian Computation: The Choice of Test Quantities Shapes Sensitivity. 基于仿真的贝叶斯计算校准检验：试验量形状灵敏度的选择。

IF 2.5 2区数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Bayesian Analysis

Pub Date : 2025-06-01 Epub Date: 2023-11-23 DOI: 10.1214/23-ba1404

Martin Modrák, Angie H Moon, Shinyoung Kim, Paul Bürkner, Niko Huurre, Kateřina Faltejsková, Andrew Gelman, Aki Vehtari

Simulation-based calibration checking (SBC) is a practical method to validate computationally-derived posterior distributions or their approximations. In this paper, we introduce a new variant of SBC to alleviate several known problems. Our variant allows the user to in principle detect any possible issue with the posterior, while previously reported implementations could never detect large classes of problems including when the posterior is equal to the prior. This is made possible by including additional data-dependent test quantities when running SBC. We argue and demonstrate that the joint likelihood of the data is an especially useful test quantity. Some other types of test quantities and their theoretical and practical benefits are also investigated. We provide theoretical analysis of SBC, thereby providing a more complete understanding of the underlying statistical mechanisms. We also bring attention to a relatively common mistake in the literature and clarify the difference between SBC and checks based on the data-averaged posterior. We support our recommendations with numerical case studies on a multivariate normal example and a case study in implementing an ordered simplex data type for use with Hamiltonian Monte Carlo. The SBC variant introduced in this paper is implemented in the SBC R package.

基于模拟的校准检验（SBC）是一种验证计算后验分布或其近似的实用方法。在本文中，我们引入了一种新的SBC变体来缓解几个已知的问题。我们的变体原则上允许用户检测到任何可能的后验问题，而以前报道的实现永远无法检测到大量的问题，包括当后验等于先验时。这可以通过在运行SBC时包含额外的数据相关测试量来实现。我们论证并证明了数据的联合似然是一个特别有用的测试量。对其他类型的试验量及其理论和实际效益也进行了探讨。我们提供了SBC的理论分析，从而提供了对潜在统计机制的更完整的理解。我们还注意到文献中一个相对常见的错误，并澄清SBC和基于数据平均后验的检查之间的区别。我们用一个多变量正态例子的数值案例研究和一个用哈密顿蒙特卡罗实现有序单纯形数据类型的案例研究来支持我们的建议。本文介绍的SBC变体是在SBC R包中实现的。

{"title":"Simulation-Based Calibration Checking for Bayesian Computation: The Choice of Test Quantities Shapes Sensitivity.","authors":"Martin Modrák, Angie H Moon, Shinyoung Kim, Paul Bürkner, Niko Huurre, Kateřina Faltejsková, Andrew Gelman, Aki Vehtari","doi":"10.1214/23-ba1404","DOIUrl":"10.1214/23-ba1404","url":null,"abstract":"Simulation-based calibration checking (SBC) is a practical method to validate computationally-derived posterior distributions or their approximations. In this paper, we introduce a new variant of SBC to alleviate several known problems. Our variant allows the user to in principle detect any possible issue with the posterior, while previously reported implementations could never detect large classes of problems including when the posterior is equal to the prior. This is made possible by including additional data-dependent test quantities when running SBC. We argue and demonstrate that the joint likelihood of the data is an especially useful test quantity. Some other types of test quantities and their theoretical and practical benefits are also investigated. We provide theoretical analysis of SBC, thereby providing a more complete understanding of the underlying statistical mechanisms. We also bring attention to a relatively common mistake in the literature and clarify the difference between SBC and checks based on the data-averaged posterior. We support our recommendations with numerical case studies on a multivariate normal example and a case study in implementing an ordered simplex data type for use with Hamiltonian Monte Carlo. The SBC variant introduced in this paper is implemented in the SBC R package.","PeriodicalId":55398,"journal":{"name":"Bayesian Analysis","volume":"20 2","pages":"461-488"},"PeriodicalIF":2.5,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12490788/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145234300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Nonparametric Bayes Differential Analysis of Multigroup DNA Methylation Data. 多组DNA甲基化数据的非参数贝叶斯差异分析。

IF 4.9 2区数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Bayesian Analysis

Pub Date : 2025-06-01 Epub Date: 2023-11-23 DOI: 10.1214/23-ba1407

Chiyu Gu, Veerabhadran Baladandayuthapani, Subharup Guha

DNA methylation datasets in cancer studies are comprised of measurements on a large number of genomic locations called cytosine-phosphate-guanine (CpG) sites with complex correlation structures. A fundamental goal of these studies is the development of statistical techniques that can identify disease genomic signatures across multiple patient groups defined by different experimental or biological conditions. We propose BayesDiff, a nonparametric Bayesian approach for differential analysis relying on a novel class of first order mixture models called the Sticky Pitman-Yor process or two-restaurant two-cuisine franchise (2R2CF). The BayesDiff methodology flexibly utilizes information from all CpG sites or biomarker probes, adaptively accommodates any serial dependence due to the widely varying inter-probe distances, and makes posterior inferences about the differential genomic signature of patient groups. Using simulation studies, we demonstrate the effectiveness of the BayesDiff procedure relative to existing statistical techniques for differential DNA methylation. The methodology is applied to analyze a gastrointestinal (GI) cancer dataset exhibiting serial correlation and complex interaction patterns. The results support and complement known aspects of DNA methylation and gene association in upper GI cancers.

癌症研究中的DNA甲基化数据集由大量具有复杂相关结构的基因组位置（称为胞嘧啶-磷酸-鸟嘌呤（CpG）位点）的测量组成。这些研究的一个基本目标是发展统计技术，以识别由不同实验或生物学条件定义的多组患者的疾病基因组特征。我们提出了BayesDiff，这是一种非参数贝叶斯方法，用于微分分析，依赖于一类新的一阶混合模型，称为Sticky Pitman-Yor过程或两餐厅两美食特许经营（2R2CF）。BayesDiff方法灵活地利用来自所有CpG位点或生物标记探针的信息，自适应地适应由于探针间距离差异很大而产生的序列依赖性，并对患者群体的差异基因组特征进行后验推断。通过模拟研究，我们证明了BayesDiff程序相对于现有的差异DNA甲基化统计技术的有效性。该方法被应用于分析胃肠道（GI）癌症数据集，显示出序列相关性和复杂的相互作用模式。这些结果支持并补充了上消化道癌症中DNA甲基化和基因关联的已知方面。

{"title":"Nonparametric Bayes Differential Analysis of Multigroup DNA Methylation Data.","authors":"Chiyu Gu, Veerabhadran Baladandayuthapani, Subharup Guha","doi":"10.1214/23-ba1407","DOIUrl":"10.1214/23-ba1407","url":null,"abstract":"DNA methylation datasets in cancer studies are comprised of measurements on a large number of genomic locations called cytosine-phosphate-guanine (CpG) sites with complex correlation structures. A fundamental goal of these studies is the development of statistical techniques that can identify disease genomic signatures across multiple patient groups defined by different experimental or biological conditions. We propose BayesDiff, a nonparametric Bayesian approach for differential analysis relying on a novel class of first order mixture models called the Sticky Pitman-Yor process or two-restaurant two-cuisine franchise (2R2CF). The BayesDiff methodology flexibly utilizes information from all CpG sites or biomarker probes, adaptively accommodates any serial dependence due to the widely varying inter-probe distances, and makes posterior inferences about the differential genomic signature of patient groups. Using simulation studies, we demonstrate the effectiveness of the BayesDiff procedure relative to existing statistical techniques for differential DNA methylation. The methodology is applied to analyze a gastrointestinal (GI) cancer dataset exhibiting serial correlation and complex interaction patterns. The results support and complement known aspects of DNA methylation and gene association in upper GI cancers.","PeriodicalId":55398,"journal":{"name":"Bayesian Analysis","volume":"20 2","pages":"489-518"},"PeriodicalIF":4.9,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12094113/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144129658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gridding and Parameter Expansion for Scalable Latent Gaussian Models of Spatial Multivariate Data. 空间多元数据可伸缩隐高斯模型的网格化和参数展开。

IF 2.5 2区数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Bayesian Analysis

Pub Date : 2025-03-12 DOI: 10.1214/25-BA1515

Michele Peruzzi, Sudipto Banerjee, David B Dunson, Andrew O Finley

Scalable spatial GPs for massive datasets can be built via sparse Directed Acyclic Graphs (DAGs) where a small number of directed edges is sufficient to flexibly characterize spatial dependence. The DAG can be used to devise fast algorithms for posterior sampling of the latent process, but these may exhibit pathological behavior in estimating covariance parameters. In this article, we introduce gridding and parameter expansion methods to improve the practical performance of MCMC algorithms in terms of effective sample size per unit time (ESS/s). Gridding is a model-based strategy that reduces the number of expensive operations necessary during MCMC on irregularly spaced data. Parameter expansion reduces dependence in posterior samples in spatial regression for high resolution data. These two strategies lead to computational gains in the big data settings on which we focus. We consider popular constructions of univariate spatial processes based on Matérn covariance functions and multivariate coregionalization models for Gaussian outcomes in extensive analyses of synthetic datasets comparing with alternative methods. We demonstrate effectiveness of our proposed methods in a forestry application using remotely sensed data from NASA's Goddard LiDAR, Hyper-Spectral, and Thermal imager (G-LiHT).

大规模数据集的可扩展空间GPs可以通过稀疏有向无环图（dag）建立，其中少量有向边足以灵活地表征空间依赖性。DAG可用于设计潜在过程后验采样的快速算法，但这些算法在估计协方差参数时可能表现出病态行为。在本文中，我们引入网格化和参数展开方法，以提高MCMC算法在单位时间有效样本量（ESS/s）方面的实际性能。网格是一种基于模型的策略，它减少了对不规则间隔数据进行MCMC时所需的昂贵操作的数量。参数展开降低了高分辨率数据空间回归中后验样本的依赖性。这两种策略在我们关注的大数据环境中带来了计算收益。在综合数据集的广泛分析中，我们考虑了基于mat协方差函数和高斯结果的多元共区域化模型的单变量空间过程的流行结构，并与其他方法进行了比较。我们利用NASA戈达德激光雷达、高光谱和热成像仪（g - light）的遥感数据，在林业应用中证明了我们提出的方法的有效性。

{"title":"Gridding and Parameter Expansion for Scalable Latent Gaussian Models of Spatial Multivariate Data.","authors":"Michele Peruzzi, Sudipto Banerjee, David B Dunson, Andrew O Finley","doi":"10.1214/25-BA1515","DOIUrl":"10.1214/25-BA1515","url":null,"abstract":"Scalable spatial GPs for massive datasets can be built via sparse Directed Acyclic Graphs (DAGs) where a small number of directed edges is sufficient to flexibly characterize spatial dependence. The DAG can be used to devise fast algorithms for posterior sampling of the latent process, but these may exhibit pathological behavior in estimating covariance parameters. In this article, we introduce gridding and parameter expansion methods to improve the practical performance of MCMC algorithms in terms of effective sample size per unit time (ESS/s). Gridding is a model-based strategy that reduces the number of expensive operations necessary during MCMC on irregularly spaced data. Parameter expansion reduces dependence in posterior samples in spatial regression for high resolution data. These two strategies lead to computational gains in the big data settings on which we focus. We consider popular constructions of univariate spatial processes based on Matérn covariance functions and multivariate coregionalization models for Gaussian outcomes in extensive analyses of synthetic datasets comparing with alternative methods. We demonstrate effectiveness of our proposed methods in a forestry application using remotely sensed data from NASA's Goddard LiDAR, Hyper-Spectral, and Thermal imager (G-LiHT).","PeriodicalId":55398,"journal":{"name":"Bayesian Analysis","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12533814/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145330923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploiting Multivariate Network Meta-Analysis: A Calibrated Bayesian Composite Likelihood Inference. 利用多元网络元分析：校准贝叶斯复合似然推断。

IF 2.5 2区数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Bayesian Analysis

Pub Date : 2025-02-20 DOI: 10.1214/25-ba1511

Yifei Wang, Lifeng Lin, Yu-Lun Liu

Multivariate network meta-analysis has emerged as a powerful tool for evidence synthesis by incorporating multiple outcomes and treatments. Despite its advantages, this method comes with methodological challenges, such as the issue of unreported within-study correlations among treatments and outcomes, which can lead to biased estimates and misleading conclusions. In this paper, we propose a calibrated Bayesian composite likelihood approach to overcome this limitation. The proposed method eliminates the need for a fully specified likelihood function while allowing for the unavailability of within-study correlations among treatments and outcomes. Additionally, we developed a hybrid Gibbs sampler algorithm along with the Open-Faced Sandwich post-sampling adjustment to enable robust posterior inference. Through comprehensive simulation studies, we demonstrated that the proposed approach yields unbiased estimates while maintaining coverage probabilities close to the nominal levels. We implemented the proposed method to two real-world network meta-analysis datasets: one comparing treatment procedures for root coverage and the other comparing treatments for anemia in patients with chronic kidney disease.

多元网络荟萃分析已成为证据综合的有力工具，通过纳入多种结果和治疗。尽管有其优势，但该方法也面临着方法学上的挑战，例如未报告的研究内治疗和结果之间的相关性问题，这可能导致有偏见的估计和误导性的结论。在本文中，我们提出了一种校正贝叶斯复合似然方法来克服这一限制。提出的方法消除了对完全指定的似然函数的需要，同时允许治疗和结果之间的研究内相关性不可用。此外，我们开发了一种混合Gibbs采样器算法以及开放式三明治采样后调整，以实现鲁棒后验推理。通过全面的模拟研究，我们证明了所提出的方法在保持覆盖概率接近名义水平的同时产生无偏估计。我们在两个现实世界的网络荟萃分析数据集上实施了提出的方法：一个比较根覆盖的治疗方法，另一个比较慢性肾病患者贫血的治疗方法。

{"title":"Exploiting Multivariate Network Meta-Analysis: A Calibrated Bayesian Composite Likelihood Inference.","authors":"Yifei Wang, Lifeng Lin, Yu-Lun Liu","doi":"10.1214/25-ba1511","DOIUrl":"10.1214/25-ba1511","url":null,"abstract":"Multivariate network meta-analysis has emerged as a powerful tool for evidence synthesis by incorporating multiple outcomes and treatments. Despite its advantages, this method comes with methodological challenges, such as the issue of unreported within-study correlations among treatments and outcomes, which can lead to biased estimates and misleading conclusions. In this paper, we propose a calibrated Bayesian composite likelihood approach to overcome this limitation. The proposed method eliminates the need for a fully specified likelihood function while allowing for the unavailability of within-study correlations among treatments and outcomes. Additionally, we developed a hybrid Gibbs sampler algorithm along with the Open-Faced Sandwich post-sampling adjustment to enable robust posterior inference. Through comprehensive simulation studies, we demonstrated that the proposed approach yields unbiased estimates while maintaining coverage probabilities close to the nominal levels. We implemented the proposed method to two real-world network meta-analysis datasets: one comparing treatment procedures for root coverage and the other comparing treatments for anemia in patients with chronic kidney disease.","PeriodicalId":55398,"journal":{"name":"Bayesian Analysis","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453069/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Causally Sound Priors for Binary Experiments. 二元实验的因果合理先验。

IF 2.5 2区数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Bayesian Analysis

Pub Date : 2025-01-28 DOI: 10.1214/25-BA1506

Nicholas J Irons, Carlos Cinelli

We introduce the BREASE framework for the Bayesian analysis of randomized controlled trials with binary treatment and outcome. Approaching the problem from a causal inference perspective, we propose parameterizing the likelihood in terms of the baseline risk, efficacy, and adverse side effects of the treatment, along with a flexible, yet intuitive and tractable jointly independent beta prior distribution on these parameters, which we show to be a generalization of the Dirichlet prior for the joint distribution of potential outcomes. Our approach has a number of desirable characteristics when compared to current mainstream alternatives: (i) it naturally induces prior dependence between expected outcomes in the treatment and control groups; (ii) as the baseline risk, efficacy and risk of adverse side effects are quantities commonly present in the clinicians' vocabulary, the hyperparameters of the prior are directly interpretable, thus facilitating the elicitation of prior knowledge and sensitivity analysis; and (iii) we provide analytical formulae for the marginal likelihood, Bayes factor, and other posterior quantities, as well as an exact posterior sampling algorithm and an accurate and fast data-augmented Gibbs sampler in cases where traditional MCMC fails. Empirical examples demonstrate the utility of our methods for estimation, hypothesis testing, and sensitivity analysis of treatment effects.

我们引入BREASE框架，对具有二元治疗和结果的随机对照试验进行贝叶斯分析。从因果推理的角度来解决这个问题，我们提出了参数化治疗的基线风险、疗效和不良副作用的可能性，以及这些参数的灵活、直观和易于处理的联合独立beta先验分布，我们认为这是Dirichlet先验对潜在结果联合分布的推广。与目前的主流替代方法相比，我们的方法具有许多可取的特征：(i)它自然地诱导了治疗组和对照组预期结果之间的先验依赖性；（ii）由于基线风险、疗效和不良副作用风险是临床医生词汇中常见的量，先验的超参数是可直接解释的，从而便于先验知识的提取和敏感性分析；（iii）在传统MCMC失效的情况下，我们提供了边际似然、贝叶斯因子和其他后验量的解析公式，以及精确的后验抽样算法和准确快速的数据增强吉布斯采样器。实证例子证明了我们的方法在估计、假设检验和治疗效果敏感性分析方面的实用性。

{"title":"Causally Sound Priors for Binary Experiments.","authors":"Nicholas J Irons, Carlos Cinelli","doi":"10.1214/25-BA1506","DOIUrl":"10.1214/25-BA1506","url":null,"abstract":"We introduce the BREASE framework for the Bayesian analysis of randomized controlled trials with binary treatment and outcome. Approaching the problem from a causal inference perspective, we propose parameterizing the likelihood in terms of the baseline risk, efficacy, and adverse side effects of the treatment, along with a flexible, yet intuitive and tractable jointly independent beta prior distribution on these parameters, which we show to be a generalization of the Dirichlet prior for the joint distribution of potential outcomes. Our approach has a number of desirable characteristics when compared to current mainstream alternatives: (i) it naturally induces prior dependence between expected outcomes in the treatment and control groups; (ii) as the baseline risk, efficacy and risk of adverse side effects are quantities commonly present in the clinicians' vocabulary, the hyperparameters of the prior are directly interpretable, thus facilitating the elicitation of prior knowledge and sensitivity analysis; and (iii) we provide analytical formulae for the marginal likelihood, Bayes factor, and other posterior quantities, as well as an exact posterior sampling algorithm and an accurate and fast data-augmented Gibbs sampler in cases where traditional MCMC fails. Empirical examples demonstrate the utility of our methods for estimation, hypothesis testing, and sensitivity analysis of treatment effects.","PeriodicalId":55398,"journal":{"name":"Bayesian Analysis","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12416923/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145031209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bag of DAGs: Inferring Directional Dependence in Spatiotemporal Processes. DAGs袋：推断时空过程中的方向依赖性。

IF 2.5 2区数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Bayesian Analysis

Pub Date : 2024-11-11 DOI: 10.1214/24-ba1473

Bora Jin, Michele Peruzzi, David Dunson

We propose a class of nonstationary processes to characterize space- and time-varying directional associations in point-referenced data. We are motivated by spatiotemporal modeling of air pollutants in which local wind patterns are key determinants of the pollutant spread, but information regarding prevailing wind directions may be missing or unreliable. We propose to map a discrete set of wind directions to edges in a sparse directed acyclic graph (DAG), accounting for uncertainty in directional correlation patterns across a domain. The resulting Bag of DAGs processes (BAGs) lead to interpretable nonstationarity and scalability for large data due to sparsity of DAGs in the bag. We outline Bayesian hierarchical models using BAGs and illustrate inferential and performance gains of our methods compared to other state-of-the-art alternatives. We analyze fine particulate matter using high-resolution data from low-cost air quality sensors in California during the 2020 wildfire season. An R package is available on GitHub.

我们提出了一类非平稳过程来表征点参考数据中的时空变化方向关联。我们的动机是空气污染物的时空建模，其中当地的风模式是污染物扩散的关键决定因素，但有关盛行风向的信息可能缺失或不可靠。我们建议将一组离散的风向映射到稀疏有向无环图（DAG）的边缘，考虑到跨域方向相关模式的不确定性。由此产生的Bag of dag过程（BAGs）由于Bag中dag的稀疏性导致了大数据的可解释非平稳性和可扩展性。我们概述了使用BAGs的贝叶斯层次模型，并说明了与其他最先进的替代方法相比，我们的方法的推理和性能增益。我们在2020年野火季节使用来自加州低成本空气质量传感器的高分辨率数据分析细颗粒物。在GitHub上可以找到R包。

引用次数: 0

Bayesian Analysis of Exponential Random Graph Models Using Stochastic Gradient Markov Chain Monte Carlo. 基于随机梯度马尔可夫链蒙特卡罗的指数随机图模型的贝叶斯分析

IF 4.9 2区数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Bayesian Analysis

Pub Date : 2024-06-01 Epub Date: 2024-04-09 DOI: 10.1214/23-BA1364

Qian Zhang, Faming Liang

The exponential random graph model (ERGM) is a popular model for social networks, which is known to have an intractable likelihood function. Sampling from the posterior for such a model is a long-standing problem in statistical research. We analyze the performance of the stochastic gradient Langevin dynamics (SGLD) algorithm (also known as noisy Longevin Monte Carlo) in tackling this problem, where the stochastic gradient is calculated via running a short Markov chain (the so-called inner Markov chain in this paper) at each iteration. We show that if the model size grows with the network size slowly enough, then SGLD converges to the true posterior in 2-Wasserstein distance as the network size and iteration number become large regardless of the length of the inner Markov chain performed at each iteration. Our study provides a scalable algorithm for analyzing large-scale social networks with possibly high-dimensional ERGMs.

指数随机图模型（ERGM）是一种流行的社交网络模型，它具有难以处理的似然函数。这种模型的后验抽样是统计研究中一个长期存在的问题。我们分析了随机梯度朗格万动力学（SGLD）算法（在蒙特卡洛中也称为噪声寿命）在解决这个问题时的性能，其中随机梯度是通过在每次迭代中运行短马尔可夫链（本文中所谓的内马尔可夫链）来计算的。我们证明，如果模型大小随着网络大小的增长足够慢，那么无论每次迭代执行的内马尔可夫链的长度如何，随着网络大小和迭代次数的增加，SGLD收敛到2-Wasserstein距离的真实后验。我们的研究提供了一种可扩展的算法，用于分析可能具有高维ergm的大型社交网络。

引用次数: 0

Easily Computed Marginal Likelihoods from Posterior Simulation Using the THAMES Estimator. 利用泰晤士估计器从后验模拟中轻松计算边际似然。

IF 2.5 2区数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Bayesian Analysis

Pub Date : 2024-04-23 DOI: 10.1214/24-ba1422

Martin Metodiev, Marie Perrot-Dockès, Sarah Ouadah, Nicholas J Irons, Pierre Latouche, Adrian E Raftery

We propose an easily computed estimator of marginal likelihoods from posterior simulation output, via reciprocal importance sampling, combining earlier proposals of DiCiccio et al (1997) and Robert and Wraith (2009). This involves only the unnormalized posterior densities from the sampled parameter values, and does not involve additional simulations beyond the main posterior simulation, or additional complicated calculations, provided that the parameter space is unconstrained. Even if this is not the case, the estimator is easily adjusted by a simple Monte Carlo approximation. It is unbiased for the reciprocal of the marginal likelihood, consistent, has finite variance, and is asymptotically normal. It involves one user-specified control parameter, and we derive an optimal way of specifying this. We illustrate it with several numerical examples.

我们结合DiCiccio等人（1997）和Robert和Wraith（2009）的早期建议，通过互反重要性抽样，从后验模拟输出中提出一个易于计算的边际似然估计。这只涉及采样参数值的非归一化后验密度，而不涉及主要后验模拟之外的额外模拟，或额外的复杂计算，前提是参数空间是无约束的。即使不是这样，估计量也很容易通过简单的蒙特卡罗近似来调整。它对于边际似然的倒数是无偏的，一致的，有有限的方差，并且是渐近正态的。它涉及一个用户指定的控制参数，并且我们推导了指定该参数的最佳方法。我们用几个数值例子来说明它。

引用次数: 0

Fast Methods for Posterior Inference of Two-Group Normal-Normal Models. 两组正态模型后验推理的快速方法

IF 2.5 2区数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Bayesian Analysis

Pub Date : 2023-09-01 Epub Date: 2022-09-09 DOI: 10.1214/22-ba1329

Philip Greengard, Jeremy Hoskins, Charles C Margossian, Jonah Gabry, Andrew Gelman, Aki Vehtari

We describe a class of algorithms for evaluating posterior moments of certain Bayesian linear regression models with a normal likelihood and a normal prior on the regression coefficients. The proposed methods can be used for hierarchical mixed effects models with partial pooling over one group of predictors, as well as random effects models with partial pooling over two groups of predictors. We demonstrate the performance of the methods on two applications, one involving U.S. opinion polls and one involving the modeling of COVID-19 outbreaks in Israel using survey data. The algorithms involve analytical marginalization of regression coefficients followed by numerical integration of the remaining low-dimensional density. The dominant cost of the algorithms is an eigendecomposition computed once for each value of the outside parameter of integration. Our approach drastically reduces run times compared to state-of-the-art Markov chain Monte Carlo (MCMC) algorithms. The latter, in addition to being computationally expensive, can also be difficult to tune when applied to hierarchical models.

我们描述了一类算法，用于评估回归系数上具有正态似然和正态先验的某些贝叶斯线性回归模型的后验矩。所提出的方法可用于在一组预测因子上具有部分池化的分层混合效应模型，以及在两组预测因子中具有部分池的随机效应模型。我们在两个应用程序上演示了这些方法的性能，一个涉及美国民意调查，另一个涉及使用调查数据对以色列新冠肺炎疫情进行建模。算法包括回归系数的分析边缘化，然后对剩余的低维密度进行数值积分。算法的主要成本是为积分的外部参数的每个值计算一次本征分解。与最先进的马尔可夫链蒙特卡罗（MCMC）算法相比，我们的方法大大缩短了运行时间。后者除了计算成本高之外，在应用于分层模型时也很难进行调整。

{"title":"Fast Methods for Posterior Inference of Two-Group Normal-Normal Models.","authors":"Philip Greengard, Jeremy Hoskins, Charles C Margossian, Jonah Gabry, Andrew Gelman, Aki Vehtari","doi":"10.1214/22-ba1329","DOIUrl":"10.1214/22-ba1329","url":null,"abstract":"We describe a class of algorithms for evaluating posterior moments of certain Bayesian linear regression models with a normal likelihood and a normal prior on the regression coefficients. The proposed methods can be used for hierarchical mixed effects models with partial pooling over one group of predictors, as well as random effects models with partial pooling over two groups of predictors. We demonstrate the performance of the methods on two applications, one involving U.S. opinion polls and one involving the modeling of COVID-19 outbreaks in Israel using survey data. The algorithms involve analytical marginalization of regression coefficients followed by numerical integration of the remaining low-dimensional density. The dominant cost of the algorithms is an eigendecomposition computed once for each value of the outside parameter of integration. Our approach drastically reduces run times compared to state-of-the-art Markov chain Monte Carlo (MCMC) algorithms. The latter, in addition to being computationally expensive, can also be difficult to tune when applied to hierarchical models.","PeriodicalId":55398,"journal":{"name":"Bayesian Analysis","volume":" ","pages":"889-907"},"PeriodicalIF":2.5,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12442500/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46750531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0