首页 > 最新文献

arXiv - STAT - Computation最新文献

英文 中文
Exact confidence intervals for functions of parameters in the k-sample multinomial problem k 样本多项式问题中参数函数的精确置信区间
Pub Date : 2024-06-27 DOI: arxiv-2406.19141
Michael C Sachs, Erin E Gabriel, Michael P Fay
When the target of inference is a real-valued function of probabilityparameters in the k-sample multinomial problem, variance estimation may bechallenging. In small samples, methods like the nonparametric bootstrap ordelta method may perform poorly. We propose a novel general method in thissetting for computing exact p-values and confidence intervals which means thattype I error rates are correctly bounded and confidence intervals have at leastnominal coverage at all sample sizes. Our method is applicable to anyreal-valued function of multinomial probabilities, accommodating an arbitrarynumber of samples with varying category counts. We describe the method andprovide an implementation of it in R, with some computational optimization toensure broad applicability. Simulations demonstrate our method's ability tomaintain correct coverage rates in settings where the nonparametric bootstrapfails.
当推断的目标是 k 样本多项式问题中概率参数的实值函数时,方差估计可能会很困难。在小样本中,像非参数自举阶梯法这样的方法可能会表现不佳。在这种情况下,我们提出了一种计算精确 p 值和置信区间的新颖通用方法,这意味着在所有样本大小下,I 型误差率都能得到正确的约束,置信区间至少有名义覆盖率。我们的方法适用于多项式概率的任何实值函数,可容纳任意数量的具有不同类别计数的样本。我们描述了该方法,并提供了它在 R 语言中的实现,同时进行了一些计算优化,以确保广泛的适用性。模拟证明了我们的方法能够在非参数引导法失效的情况下保持正确的覆盖率。
{"title":"Exact confidence intervals for functions of parameters in the k-sample multinomial problem","authors":"Michael C Sachs, Erin E Gabriel, Michael P Fay","doi":"arxiv-2406.19141","DOIUrl":"https://doi.org/arxiv-2406.19141","url":null,"abstract":"When the target of inference is a real-valued function of probability\u0000parameters in the k-sample multinomial problem, variance estimation may be\u0000challenging. In small samples, methods like the nonparametric bootstrap or\u0000delta method may perform poorly. We propose a novel general method in this\u0000setting for computing exact p-values and confidence intervals which means that\u0000type I error rates are correctly bounded and confidence intervals have at least\u0000nominal coverage at all sample sizes. Our method is applicable to any\u0000real-valued function of multinomial probabilities, accommodating an arbitrary\u0000number of samples with varying category counts. We describe the method and\u0000provide an implementation of it in R, with some computational optimization to\u0000ensure broad applicability. Simulations demonstrate our method's ability to\u0000maintain correct coverage rates in settings where the nonparametric bootstrap\u0000fails.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Torchtree: flexible phylogenetic model development and inference using PyTorch Torchtree:使用 PyTorch 进行灵活的系统发生模型开发和推断
Pub Date : 2024-06-26 DOI: arxiv-2406.18044
Mathieu Fourment, Matthew Macaulay, Christiaan J Swanepoel, Xiang Ji, Marc A Suchard, Frederick A Matsen IV
Bayesian inference has predominantly relied on the Markov chain Monte Carlo(MCMC) algorithm for many years. However, MCMC is computationally laborious,especially for complex phylogenetic models of time trees. This bottleneck hasled to the search for alternatives, such as variational Bayes, which can scalebetter to large datasets. In this paper, we introduce torchtree, a frameworkwritten in Python that allows developers to easily implement rich phylogeneticmodels and algorithms using a fixed tree topology. One can either use automaticdifferentiation, or leverage torchtree's plug-in system to compute gradientsanalytically for model components for which automatic differentiation is slow.We demonstrate that the torchtree variational inference framework performssimilarly to BEAST in terms of speed and approximation accuracy. Furthermore,we explore the use of the forward KL divergence as an optimizing criterion forvariational inference, which can handle discontinuous and non-differentiablemodels. Our experiments show that inference using the forward KL divergencetends to be faster per iteration compared to the evidence lower bound (ELBO)criterion, although the ELBO-based inference may converge faster in some cases.Overall, torchtree provides a flexible and efficient framework for phylogeneticmodel development and inference using PyTorch.
多年来,贝叶斯推断主要依赖于马尔科夫链蒙特卡罗(MCMC)算法。然而,MCMC 计算起来非常费力,尤其是对于复杂的时间树系统发育模型。这一瓶颈导致人们开始寻找能更好地扩展到大型数据集的替代算法,如变异贝叶斯算法。在本文中,我们介绍了 torchtree,这是一个用 Python 编写的框架,允许开发人员使用固定的树拓扑结构轻松实现丰富的系统发育模型和算法。我们证明了 torchtree 变分推理框架在速度和近似精度方面的表现与 BEAST 相似。此外,我们还探索了使用前向 KL 发散作为变量推理的优化准则,它可以处理不连续和不可微分模型。我们的实验表明,与证据下限(ELBO)准则相比,使用前向 KL 发散进行推理的每次迭代速度更快,尽管基于 ELBO 的推理在某些情况下收敛得更快。
{"title":"Torchtree: flexible phylogenetic model development and inference using PyTorch","authors":"Mathieu Fourment, Matthew Macaulay, Christiaan J Swanepoel, Xiang Ji, Marc A Suchard, Frederick A Matsen IV","doi":"arxiv-2406.18044","DOIUrl":"https://doi.org/arxiv-2406.18044","url":null,"abstract":"Bayesian inference has predominantly relied on the Markov chain Monte Carlo\u0000(MCMC) algorithm for many years. However, MCMC is computationally laborious,\u0000especially for complex phylogenetic models of time trees. This bottleneck has\u0000led to the search for alternatives, such as variational Bayes, which can scale\u0000better to large datasets. In this paper, we introduce torchtree, a framework\u0000written in Python that allows developers to easily implement rich phylogenetic\u0000models and algorithms using a fixed tree topology. One can either use automatic\u0000differentiation, or leverage torchtree's plug-in system to compute gradients\u0000analytically for model components for which automatic differentiation is slow.\u0000We demonstrate that the torchtree variational inference framework performs\u0000similarly to BEAST in terms of speed and approximation accuracy. Furthermore,\u0000we explore the use of the forward KL divergence as an optimizing criterion for\u0000variational inference, which can handle discontinuous and non-differentiable\u0000models. Our experiments show that inference using the forward KL divergence\u0000tends to be faster per iteration compared to the evidence lower bound (ELBO)\u0000criterion, although the ELBO-based inference may converge faster in some cases.\u0000Overall, torchtree provides a flexible and efficient framework for phylogenetic\u0000model development and inference using PyTorch.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable Sampling of Truncated Multivariate Normals Using Sequential Nearest-Neighbor Approximation 利用序列近邻逼近对截断多变量正态进行可扩展采样
Pub Date : 2024-06-25 DOI: arxiv-2406.17307
Jian Cao, Matthias Katzfuss
We propose a linear-complexity method for sampling from truncatedmultivariate normal (TMVN) distributions with high fidelity by applyingnearest-neighbor approximations to a product-of-conditionals decomposition ofthe TMVN density. To make the sequential sampling based on the decompositionfeasible, we introduce a novel method that avoids the intractablehigh-dimensional TMVN distribution by sampling sequentially from$m$-dimensional TMVN distributions, where $m$ is a tuning parameter controllingthe fidelity. This allows us to overcome the existing methods' crucial problemof rapidly decreasing acceptance rates for increasing dimension. Throughout ourexperiments with up to tens of thousands of dimensions, we can producehigh-fidelity samples with $m$ in the dozens, achieving superior scalabilitycompared to existing state-of-the-art methods. We study a tetrachloroethyleneconcentration dataset that has $3{,}971$ observed responses and $20{,}730$undetected responses, together modeled as a partially censored Gaussianprocess, where our method enables posterior inference for the censoredresponses through sampling a $20{,}730$-dimensional TMVN distribution.
我们提出了一种线性复杂度方法,通过对截断多变量正态分布(TMVN)密度的条件乘积分解应用最近邻近似,从截断多变量正态分布中进行高保真采样。为了使基于分解的顺序采样可行,我们引入了一种新方法,通过从 $m$ 维 TMVN 分布(其中 $m$ 是控制保真度的调整参数)顺序采样,避免了难以处理的高维 TMVN 分布。这使我们克服了现有方法的关键问题,即随着维度的增加,接受率迅速降低。在我们进行的多达数万维度的实验中,我们可以生成 $m$ 为几十的高保真样本,与现有的最先进方法相比,我们实现了卓越的可扩展性。我们研究了一个四氯乙烯浓度数据集,该数据集有3{,}971$观测到的响应和20{,}730$未检测到的响应,这些响应一起被建模为部分删减的高斯过程(partially censored Gaussianprocess),我们的方法通过对20{,}730$维的TMVN分布进行采样,实现了对删减响应的后验推断。
{"title":"Scalable Sampling of Truncated Multivariate Normals Using Sequential Nearest-Neighbor Approximation","authors":"Jian Cao, Matthias Katzfuss","doi":"arxiv-2406.17307","DOIUrl":"https://doi.org/arxiv-2406.17307","url":null,"abstract":"We propose a linear-complexity method for sampling from truncated\u0000multivariate normal (TMVN) distributions with high fidelity by applying\u0000nearest-neighbor approximations to a product-of-conditionals decomposition of\u0000the TMVN density. To make the sequential sampling based on the decomposition\u0000feasible, we introduce a novel method that avoids the intractable\u0000high-dimensional TMVN distribution by sampling sequentially from\u0000$m$-dimensional TMVN distributions, where $m$ is a tuning parameter controlling\u0000the fidelity. This allows us to overcome the existing methods' crucial problem\u0000of rapidly decreasing acceptance rates for increasing dimension. Throughout our\u0000experiments with up to tens of thousands of dimensions, we can produce\u0000high-fidelity samples with $m$ in the dozens, achieving superior scalability\u0000compared to existing state-of-the-art methods. We study a tetrachloroethylene\u0000concentration dataset that has $3{,}971$ observed responses and $20{,}730$\u0000undetected responses, together modeled as a partially censored Gaussian\u0000process, where our method enables posterior inference for the censored\u0000responses through sampling a $20{,}730$-dimensional TMVN distribution.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141523218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genealogical processes of non-neutral population models under rapid mutation 快速突变下非中性种群模型的谱系过程
Pub Date : 2024-06-24 DOI: arxiv-2406.16465
Jere Koskela, Paul A. Jenkins, Adam M. Johansen, Dario Spano
We show that genealogical trees arising from a broad class of non-neutralmodels of population evolution converge to the Kingman coalescent under asuitable rescaling of time. As well as non-neutral biological evolution, ourresults apply to genetic algorithms encompassing the prominent class ofsequential Monte Carlo (SMC) methods. The time rescaling we need differsslightly from that used in classical results for convergence to the Kingmancoalescent, which has implications for the performance of different resamplingschemes in SMC algorithms. In addition, our work substantially simplifiesearlier proofs of convergence to the Kingman coalescent, and corrects an errorcommon to several earlier results.
我们的研究表明,在适当的时间重定标条件下,由一大类非中性种群进化模型产生的系谱树会向金曼聚合收敛。除了非中性生物进化,我们的结果还适用于遗传算法,包括著名的连续蒙特卡罗(SMC)方法。我们所需的时间重定标与经典的金曼科尺度收敛结果所使用的时间重定标略有不同,这对 SMC 算法中不同重采样策略的性能有影响。此外,我们的工作还大大简化了早先关于收敛到 Kingmancoalescent 的证明,并纠正了早先几个结果中常见的错误。
{"title":"Genealogical processes of non-neutral population models under rapid mutation","authors":"Jere Koskela, Paul A. Jenkins, Adam M. Johansen, Dario Spano","doi":"arxiv-2406.16465","DOIUrl":"https://doi.org/arxiv-2406.16465","url":null,"abstract":"We show that genealogical trees arising from a broad class of non-neutral\u0000models of population evolution converge to the Kingman coalescent under a\u0000suitable rescaling of time. As well as non-neutral biological evolution, our\u0000results apply to genetic algorithms encompassing the prominent class of\u0000sequential Monte Carlo (SMC) methods. The time rescaling we need differs\u0000slightly from that used in classical results for convergence to the Kingman\u0000coalescent, which has implications for the performance of different resampling\u0000schemes in SMC algorithms. In addition, our work substantially simplifies\u0000earlier proofs of convergence to the Kingman coalescent, and corrects an error\u0000common to several earlier results.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141523192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recursive variational Gaussian approximation with the Whittle likelihood for linear non-Gaussian state space models 利用惠特尔似然对线性非高斯状态空间模型进行递归变分高斯逼近
Pub Date : 2024-06-23 DOI: arxiv-2406.15998
Bao Anh Vu, David Gunawan, Andrew Zammit-Mangion
Parameter inference for linear and non-Gaussian state space models ischallenging because the likelihood function contains an intractable integralover the latent state variables. Exact inference using Markov chain Monte Carlois computationally expensive, particularly for long time series data.Variational Bayes methods are useful when exact inference is infeasible. Thesemethods approximate the posterior density of the parameters by a simple andtractable distribution found through optimisation. In this paper, we propose anovel sequential variational Bayes approach that makes use of the Whittlelikelihood for computationally efficient parameter inference in this class ofstate space models. Our algorithm, which we call Recursive Variational GaussianApproximation with the Whittle Likelihood (R-VGA-Whittle), updates thevariational parameters by processing data in the frequency domain. At eachiteration, R-VGA-Whittle requires the gradient and Hessian of the Whittlelog-likelihood, which are available in closed form for a wide class of models.Through several examples using a linear Gaussian state space model and aunivariate/bivariate non-Gaussian stochastic volatility model, we show thatR-VGA-Whittle provides good approximations to posterior distributions of theparameters and is very computationally efficient when compared toasymptotically exact methods such as Hamiltonian Monte Carlo.
线性和非高斯状态空间模型的参数推断是一项挑战,因为似然函数包含一个难以处理的潜在状态变量积分。使用马尔科夫链蒙特卡罗进行精确推断的计算成本很高,尤其是对于长时间序列数据。当精确推断不可行时,变分贝叶斯方法就会派上用场。这些方法通过优化找到一个简单、可操作的分布,从而近似得到参数的后验密度。在本文中,我们提出了一种新的序列变分贝叶斯方法,该方法利用惠特尔似然(Whittlelikelihood)对这类状态空间模型中的参数进行高效计算推断。我们将这种算法称为 "惠特尔似然递归变异高斯逼近算法"(R-VGA-Whittle),它通过处理频域数据来更新变异参数。通过使用线性高斯状态空间模型和单变量/双变量非高斯随机波动性模型的几个例子,我们表明 R-VGA-Whittle 可以很好地近似参数的后验分布,与汉密尔顿蒙特卡洛等渐近精确方法相比,计算效率非常高。
{"title":"Recursive variational Gaussian approximation with the Whittle likelihood for linear non-Gaussian state space models","authors":"Bao Anh Vu, David Gunawan, Andrew Zammit-Mangion","doi":"arxiv-2406.15998","DOIUrl":"https://doi.org/arxiv-2406.15998","url":null,"abstract":"Parameter inference for linear and non-Gaussian state space models is\u0000challenging because the likelihood function contains an intractable integral\u0000over the latent state variables. Exact inference using Markov chain Monte Carlo\u0000is computationally expensive, particularly for long time series data.\u0000Variational Bayes methods are useful when exact inference is infeasible. These\u0000methods approximate the posterior density of the parameters by a simple and\u0000tractable distribution found through optimisation. In this paper, we propose a\u0000novel sequential variational Bayes approach that makes use of the Whittle\u0000likelihood for computationally efficient parameter inference in this class of\u0000state space models. Our algorithm, which we call Recursive Variational Gaussian\u0000Approximation with the Whittle Likelihood (R-VGA-Whittle), updates the\u0000variational parameters by processing data in the frequency domain. At each\u0000iteration, R-VGA-Whittle requires the gradient and Hessian of the Whittle\u0000log-likelihood, which are available in closed form for a wide class of models.\u0000Through several examples using a linear Gaussian state space model and a\u0000univariate/bivariate non-Gaussian stochastic volatility model, we show that\u0000R-VGA-Whittle provides good approximations to posterior distributions of the\u0000parameters and is very computationally efficient when compared to\u0000asymptotically exact methods such as Hamiltonian Monte Carlo.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Multivariate Initial Sequence Estimators for MCMC 用于 MCMC 的高效多变量初始序列估计器
Pub Date : 2024-06-22 DOI: arxiv-2406.15874
Arka Banerjee, Dootika Vats
Estimating Monte Carlo error is critical to valid simulation results inMarkov chain Monte Carlo (MCMC) and initial sequence estimators were one of thefirst methods introduced for this. Over the last few years, focus has been onmultivariate assessment of simulation error, and many multivariategeneralizations of univariate methods have been developed. The multivariateinitial sequence estimator is known to exhibit superior finite-sampleperformance compared to its competitors. However, the multivariate initialsequence estimator can be prohibitively slow, limiting its widespread use. Weprovide an efficient alternative to the multivariate initial sequence estimatorthat inherits both its asymptotic properties as well as the finite-samplesuperior performance. The effectiveness of the proposed estimator is shown viasome MCMC example implementations. Further, we also present univariate andmultivariate initial sequence estimators for when parallel MCMC chains are runand demonstrate their effectiveness over popular alternative.
估计蒙特卡洛误差对于马尔可夫链蒙特卡洛(MCMC)的有效模拟结果至关重要,而初始序列估计器是最早引入的方法之一。在过去几年中,人们一直关注模拟误差的多变量评估,并开发了许多单变量方法的多变量概括。众所周知,多变量初始序列估计器与其竞争对手相比,具有更优越的有限样本性能。然而,多变量初始序列估计器的速度过慢,限制了它的广泛应用。我们提供了一种高效的多变量初始序列估计器替代方法,它既继承了多变量初始序列估计器的渐近特性,又具有优越的有限样本性能。通过一些 MCMC 实例的实现,展示了所提出的估计器的有效性。此外,我们还提出了并行 MCMC 链运行时的单变量和多变量初始序列估计器,并证明了它们比流行的替代方法更有效。
{"title":"Efficient Multivariate Initial Sequence Estimators for MCMC","authors":"Arka Banerjee, Dootika Vats","doi":"arxiv-2406.15874","DOIUrl":"https://doi.org/arxiv-2406.15874","url":null,"abstract":"Estimating Monte Carlo error is critical to valid simulation results in\u0000Markov chain Monte Carlo (MCMC) and initial sequence estimators were one of the\u0000first methods introduced for this. Over the last few years, focus has been on\u0000multivariate assessment of simulation error, and many multivariate\u0000generalizations of univariate methods have been developed. The multivariate\u0000initial sequence estimator is known to exhibit superior finite-sample\u0000performance compared to its competitors. However, the multivariate initial\u0000sequence estimator can be prohibitively slow, limiting its widespread use. We\u0000provide an efficient alternative to the multivariate initial sequence estimator\u0000that inherits both its asymptotic properties as well as the finite-sample\u0000superior performance. The effectiveness of the proposed estimator is shown via\u0000some MCMC example implementations. Further, we also present univariate and\u0000multivariate initial sequence estimators for when parallel MCMC chains are run\u0000and demonstrate their effectiveness over popular alternative.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141523219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approximate Bayesian Computation sequential Monte Carlo via random forests 通过随机森林进行近似贝叶斯计算顺序蒙特卡罗
Pub Date : 2024-06-22 DOI: arxiv-2406.15865
Khanh N. Dinh, Zijin Xiang, Zhihan Liu, Simon Tavaré
Approximate Bayesian Computation (ABC) is a popular inference method whenlikelihoods are hard to come by. Practical bottlenecks of ABC applicationsinclude selecting statistics that summarize the data without losing too muchinformation or introducing uncertainty, and choosing distance functions andtolerance thresholds that balance accuracy and computational efficiency. Recentstudies have shown that ABC methods using random forest (RF) methodologyperform well while circumventing many of ABC's drawbacks. However, RFconstruction is computationally expensive for large numbers of trees and modelsimulations, and there can be high uncertainty in the posterior if the priordistribution is uninformative. Here we adapt distributional random forests tothe ABC setting, and introduce Approximate Bayesian Computation sequentialMonte Carlo with random forests (ABC-SMC-(D)RF). This updates the priordistribution iteratively to focus on the most likely regions in the parameterspace. We show that ABC-SMC-(D)RF can accurately infer posterior distributionsfor a wide range of deterministic and stochastic models in different scientificareas.
近似贝叶斯计算(Approximate Bayesian Computation,ABC)是一种常用的推理方法,适用于难以获得似然值的情况。近似贝叶斯计算应用的实际瓶颈包括选择既能概括数据又不会丢失过多信息或引入不确定性的统计量,以及选择兼顾准确性和计算效率的距离函数和容限阈值。最近的研究表明,使用随机森林(RF)方法的 ABC 方法表现良好,同时规避了 ABC 的许多缺点。然而,对于大量的树和模型模拟,RF 构建的计算成本很高,而且如果前值分布信息不全,后值的不确定性也会很高。在此,我们将分布随机森林调整为 ABC 设置,并引入了近似贝叶斯计算序列蒙特卡罗随机森林(ABC-SMC-(D)RF)。这种方法会迭代更新优先分布,以关注参数空间中最有可能的区域。我们证明,ABC-SMC-(D)RF 可以准确推断出不同科学领域中各种确定性和随机模型的后验分布。
{"title":"Approximate Bayesian Computation sequential Monte Carlo via random forests","authors":"Khanh N. Dinh, Zijin Xiang, Zhihan Liu, Simon Tavaré","doi":"arxiv-2406.15865","DOIUrl":"https://doi.org/arxiv-2406.15865","url":null,"abstract":"Approximate Bayesian Computation (ABC) is a popular inference method when\u0000likelihoods are hard to come by. Practical bottlenecks of ABC applications\u0000include selecting statistics that summarize the data without losing too much\u0000information or introducing uncertainty, and choosing distance functions and\u0000tolerance thresholds that balance accuracy and computational efficiency. Recent\u0000studies have shown that ABC methods using random forest (RF) methodology\u0000perform well while circumventing many of ABC's drawbacks. However, RF\u0000construction is computationally expensive for large numbers of trees and model\u0000simulations, and there can be high uncertainty in the posterior if the prior\u0000distribution is uninformative. Here we adapt distributional random forests to\u0000the ABC setting, and introduce Approximate Bayesian Computation sequential\u0000Monte Carlo with random forests (ABC-SMC-(D)RF). This updates the prior\u0000distribution iteratively to focus on the most likely regions in the parameter\u0000space. We show that ABC-SMC-(D)RF can accurately infer posterior distributions\u0000for a wide range of deterministic and stochastic models in different scientific\u0000areas.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141523220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An agent-based model of behaviour change calibrated to reversal learning data 根据逆向学习数据校准的行为变化代理模型
Pub Date : 2024-06-20 DOI: arxiv-2406.14062
Roben Delos Reyes, Hugo Lyons Keenan, Cameron Zachreson
Behaviour change lies at the heart of many observable collective phenomenasuch as the transmission and control of infectious diseases, adoption of publichealth policies, and migration of animals to new habitats. Representing theprocess of individual behaviour change in computer simulations of thesephenomena remains an open challenge. Often, computational models usephenomenological implementations with limited support from behavioural data.Without a strong connection to observable quantities, such models have limitedutility for simulating observed and counterfactual scenarios of emergentphenomena because they cannot be validated or calibrated. Here, we present asimple stochastic individual-based model of reversal learning that capturesfundamental properties of individual behaviour change, namely, the capacity tolearn based on accumulated reward signals, and the transient persistence oflearned behaviour after rewards are removed or altered. The model has only twoparameters, and we use approximate Bayesian computation to demonstrate thatthey are fully identifiable from empirical reversal learning time series data.Finally, we demonstrate how the model can be extended to account for theincreased complexity of behavioural dynamics over longer time scales involvingfluctuating stimuli. This work is a step towards the development and evaluationof fully identifiable individual-level behaviour change models that canfunction as validated submodels for complex simulations of collective behaviourchange.
行为变化是许多可观察到的集体现象的核心,例如传染病的传播和控制、公共卫生政策的采用以及动物向新栖息地的迁移。在对这些现象的计算机模拟中,如何表现个体行为变化的过程仍然是一个公开的挑战。由于与可观测量没有紧密联系,这些模型在模拟突发现象的观测情景和反事实情景时作用有限,因为它们无法得到验证或校准。在这里,我们提出了一个简单的基于个体的随机逆向学习模型,它捕捉到了个体行为变化的基本特性,即基于累积奖励信号的学习能力,以及在奖励被移除或改变后所学行为的短暂持续性。该模型只有两个参数,我们利用近似贝叶斯计算证明它们完全可以从经验反转学习时间序列数据中识别出来。最后,我们证明了如何扩展该模型,以解释在涉及波动刺激的更长时间尺度上行为动态的复杂性。这项工作是朝着开发和评估完全可识别的个体水平行为变化模型迈出的一步,这些模型可以作为复杂的集体行为变化模拟的验证子模型发挥作用。
{"title":"An agent-based model of behaviour change calibrated to reversal learning data","authors":"Roben Delos Reyes, Hugo Lyons Keenan, Cameron Zachreson","doi":"arxiv-2406.14062","DOIUrl":"https://doi.org/arxiv-2406.14062","url":null,"abstract":"Behaviour change lies at the heart of many observable collective phenomena\u0000such as the transmission and control of infectious diseases, adoption of public\u0000health policies, and migration of animals to new habitats. Representing the\u0000process of individual behaviour change in computer simulations of these\u0000phenomena remains an open challenge. Often, computational models use\u0000phenomenological implementations with limited support from behavioural data.\u0000Without a strong connection to observable quantities, such models have limited\u0000utility for simulating observed and counterfactual scenarios of emergent\u0000phenomena because they cannot be validated or calibrated. Here, we present a\u0000simple stochastic individual-based model of reversal learning that captures\u0000fundamental properties of individual behaviour change, namely, the capacity to\u0000learn based on accumulated reward signals, and the transient persistence of\u0000learned behaviour after rewards are removed or altered. The model has only two\u0000parameters, and we use approximate Bayesian computation to demonstrate that\u0000they are fully identifiable from empirical reversal learning time series data.\u0000Finally, we demonstrate how the model can be extended to account for the\u0000increased complexity of behavioural dynamics over longer time scales involving\u0000fluctuating stimuli. This work is a step towards the development and evaluation\u0000of fully identifiable individual-level behaviour change models that can\u0000function as validated submodels for complex simulations of collective behaviour\u0000change.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-level Phenotypic Models of Cardiovascular Disease and Obstructive Sleep Apnea Comorbidities: A Longitudinal Wisconsin Sleep Cohort Study 心血管疾病和阻塞性睡眠呼吸暂停合并症的多层次表型模型:威斯康星睡眠队列纵向研究
Pub Date : 2024-06-19 DOI: arxiv-2406.18602
Duy Nguyen, Ca Hoang, Phat K. Huynh, Tien Truong, Dang Nguyen, Abhay Sharma, Trung Q. Le
Cardiovascular diseases (CVDs) are notably prevalent among patients withobstructive sleep apnea (OSA), posing unique challenges in predicting CVDprogression due to the intricate interactions of comorbidities. Traditionalmodels typically lack the necessary dynamic and longitudinal scope toaccurately forecast CVD trajectories in OSA patients. This study introduces anovel multi-level phenotypic model to analyze the progression and interplay ofthese conditions over time, utilizing data from the Wisconsin Sleep Cohort,which includes 1,123 participants followed for decades. Our methodologycomprises three advanced steps: (1) Conducting feature importance analysisthrough tree-based models to underscore critical predictive variables liketotal cholesterol, low-density lipoprotein (LDL), and diabetes. (2) Developinga logistic mixed-effects model (LGMM) to track longitudinal transitions andpinpoint significant factors, which displayed a diagnostic accuracy of 0.9556.(3) Implementing t-distributed Stochastic Neighbor Embedding (t-SNE) alongsideGaussian Mixture Models (GMM) to segment patient data into distinct phenotypicclusters that reflect varied risk profiles and disease progression pathways.This phenotypic clustering revealed two main groups, with one showing amarkedly increased risk of major adverse cardiovascular events (MACEs),underscored by the significant predictive role of nocturnal hypoxia andsympathetic nervous system activity from sleep data. Analysis of transitionsand trajectories with t-SNE and GMM highlighted different progression rateswithin the cohort, with one cluster progressing more slowly towards severe CVDstates than the other. This study offers a comprehensive understanding of thedynamic relationship between CVD and OSA, providing valuable tools forpredicting disease onset and tailoring treatment approaches.
心血管疾病(CVDs)在患有结构性睡眠呼吸暂停(OSA)的患者中非常普遍,由于合并症之间错综复杂的相互作用,给预测心血管疾病的进展带来了独特的挑战。传统模型通常缺乏必要的动态和纵向范围,无法准确预测 OSA 患者的心血管疾病发展轨迹。本研究利用威斯康星睡眠队列(Wisconsin Sleep Cohort)的数据,引入了一种新的多层次表型模型来分析这些疾病随时间的发展和相互作用。我们的方法包括三个先进步骤:(1)通过树状模型进行特征重要性分析,以强调关键的预测变量,如总胆固醇、低密度脂蛋白(LDL)和糖尿病。(2)开发逻辑混合效应模型(LGMM)来追踪纵向转变并指出重要因素,诊断准确率为 0.9556。3)实施 t 分布随机邻域嵌入(t-SNE)和高斯混合模型(GMM),将患者数据分割成不同的表型聚类,以反映不同的风险特征和疾病进展途径。这种表型聚类揭示了两个主要群体,其中一个群体发生主要不良心血管事件(MACE)的风险明显增加,而睡眠数据中的夜间缺氧和交感神经系统活动的重要预测作用则凸显了这一点。利用 t-SNE 和 GMM 对过渡和轨迹进行的分析突显了队列中不同的进展速度,其中一个群组比另一个群组在严重心血管疾病状态的进展速度更慢。这项研究全面揭示了心血管疾病与 OSA 之间的动态关系,为预测疾病的发生和定制治疗方法提供了宝贵的工具。
{"title":"Multi-level Phenotypic Models of Cardiovascular Disease and Obstructive Sleep Apnea Comorbidities: A Longitudinal Wisconsin Sleep Cohort Study","authors":"Duy Nguyen, Ca Hoang, Phat K. Huynh, Tien Truong, Dang Nguyen, Abhay Sharma, Trung Q. Le","doi":"arxiv-2406.18602","DOIUrl":"https://doi.org/arxiv-2406.18602","url":null,"abstract":"Cardiovascular diseases (CVDs) are notably prevalent among patients with\u0000obstructive sleep apnea (OSA), posing unique challenges in predicting CVD\u0000progression due to the intricate interactions of comorbidities. Traditional\u0000models typically lack the necessary dynamic and longitudinal scope to\u0000accurately forecast CVD trajectories in OSA patients. This study introduces a\u0000novel multi-level phenotypic model to analyze the progression and interplay of\u0000these conditions over time, utilizing data from the Wisconsin Sleep Cohort,\u0000which includes 1,123 participants followed for decades. Our methodology\u0000comprises three advanced steps: (1) Conducting feature importance analysis\u0000through tree-based models to underscore critical predictive variables like\u0000total cholesterol, low-density lipoprotein (LDL), and diabetes. (2) Developing\u0000a logistic mixed-effects model (LGMM) to track longitudinal transitions and\u0000pinpoint significant factors, which displayed a diagnostic accuracy of 0.9556.\u0000(3) Implementing t-distributed Stochastic Neighbor Embedding (t-SNE) alongside\u0000Gaussian Mixture Models (GMM) to segment patient data into distinct phenotypic\u0000clusters that reflect varied risk profiles and disease progression pathways.\u0000This phenotypic clustering revealed two main groups, with one showing a\u0000markedly increased risk of major adverse cardiovascular events (MACEs),\u0000underscored by the significant predictive role of nocturnal hypoxia and\u0000sympathetic nervous system activity from sleep data. Analysis of transitions\u0000and trajectories with t-SNE and GMM highlighted different progression rates\u0000within the cohort, with one cluster progressing more slowly towards severe CVD\u0000states than the other. This study offers a comprehensive understanding of the\u0000dynamic relationship between CVD and OSA, providing valuable tools for\u0000predicting disease onset and tailoring treatment approaches.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141531890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallelizing MCMC with Machine Learning Classifier and Its Criterion Based on Kullback-Leibler Divergence 利用机器学习分类器并行化 MCMC 及其基于库尔贝-莱布勒发散的标准
Pub Date : 2024-06-17 DOI: arxiv-2406.11246
Tomoki Matsumoto, Yuichiro Kanazawa
In the era of Big Data, analyzing high-dimensional and large datasetspresents significant computational challenges. Although Bayesian statistics iswell-suited for these complex data structures, Markov chain Monte Carlo (MCMC)method, which are essential for Bayesian estimation, suffers from computationcost because of its sequential nature. For faster and more effectivecomputation, this paper introduces an algorithm to enhance a parallelizing MCMCmethod to handle this computation problem. We highlight the critical role ofthe overlapped area of posterior distributions after data partitioning, andpropose a method using a machine learning classifier to effectively identifyand extract MCMC draws from the area to approximate the actual posteriordistribution. Our main contribution is the development of a Kullback-Leibler(KL) divergence-based criterion that simplifies hyperparameter tuning intraining a classifier and makes the method nearly hyperparameter-free.Simulation studies validate the efficacy of our proposed methods.
在大数据时代,分析高维和大型数据集给计算带来了巨大挑战。虽然贝叶斯统计法非常适合这些复杂的数据结构,但作为贝叶斯估计必不可少的马尔科夫链蒙特卡洛(MCMC)方法却因其顺序性而受到计算成本的困扰。为了实现更快、更有效的计算,本文介绍了一种增强并行化 MCMC 方法的算法,以解决这一计算问题。我们强调了数据分割后后验分布重叠区域的关键作用,并提出了一种使用机器学习分类器的方法,以有效识别和提取该区域的 MCMC 抽样,从而逼近实际的后验分布。我们的主要贡献是开发了基于库尔贝-莱布勒(KL)发散的准则,简化了分类器中超参数的调整,使该方法几乎不需要超参数。
{"title":"Parallelizing MCMC with Machine Learning Classifier and Its Criterion Based on Kullback-Leibler Divergence","authors":"Tomoki Matsumoto, Yuichiro Kanazawa","doi":"arxiv-2406.11246","DOIUrl":"https://doi.org/arxiv-2406.11246","url":null,"abstract":"In the era of Big Data, analyzing high-dimensional and large datasets\u0000presents significant computational challenges. Although Bayesian statistics is\u0000well-suited for these complex data structures, Markov chain Monte Carlo (MCMC)\u0000method, which are essential for Bayesian estimation, suffers from computation\u0000cost because of its sequential nature. For faster and more effective\u0000computation, this paper introduces an algorithm to enhance a parallelizing MCMC\u0000method to handle this computation problem. We highlight the critical role of\u0000the overlapped area of posterior distributions after data partitioning, and\u0000propose a method using a machine learning classifier to effectively identify\u0000and extract MCMC draws from the area to approximate the actual posterior\u0000distribution. Our main contribution is the development of a Kullback-Leibler\u0000(KL) divergence-based criterion that simplifies hyperparameter tuning in\u0000training a classifier and makes the method nearly hyperparameter-free.\u0000Simulation studies validate the efficacy of our proposed methods.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"173 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - STAT - Computation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1