With the increasing prevalence of probabilistic programming languages, Hamiltonian Monte Carlo (HMC) has become the mainstay of applied Bayesian inference. However HMC still struggles to sample from densities with multiscale geometry: a large step size is needed to efficiently explore low curvature regions while a small step size is needed to accurately explore high curvature regions. We introduce the delayed rejection generalized HMC (DR-G-HMC) sampler that overcomes this challenge by employing dynamic step size selection, inspired by differential equation solvers. In a single sampling iteration, DR-G-HMC sequentially makes proposals with geometrically decreasing step sizes if necessary. This simulates Hamiltonian dynamics with increasing fidelity that, in high curvature regions, generates proposals with a higher chance of acceptance. DR-G-HMC also makes generalized HMC competitive by decreasing the number of rejections which otherwise cause inefficient backtracking and prevents directed movement. We present experiments to demonstrate that DR-G-HMC (1) correctly samples from multiscale densities, (2) makes generalized HMC methods competitive with the state of the art No-U-Turn sampler, and (3) is robust to tuning parameters.
{"title":"Sampling From Multiscale Densities With Delayed Rejection Generalized Hamiltonian Monte Carlo","authors":"Gilad Turok, Chirag Modi, Bob Carpenter","doi":"arxiv-2406.02741","DOIUrl":"https://doi.org/arxiv-2406.02741","url":null,"abstract":"With the increasing prevalence of probabilistic programming languages,\u0000Hamiltonian Monte Carlo (HMC) has become the mainstay of applied Bayesian\u0000inference. However HMC still struggles to sample from densities with multiscale\u0000geometry: a large step size is needed to efficiently explore low curvature\u0000regions while a small step size is needed to accurately explore high curvature\u0000regions. We introduce the delayed rejection generalized HMC (DR-G-HMC) sampler\u0000that overcomes this challenge by employing dynamic step size selection,\u0000inspired by differential equation solvers. In a single sampling iteration,\u0000DR-G-HMC sequentially makes proposals with geometrically decreasing step sizes\u0000if necessary. This simulates Hamiltonian dynamics with increasing fidelity\u0000that, in high curvature regions, generates proposals with a higher chance of\u0000acceptance. DR-G-HMC also makes generalized HMC competitive by decreasing the\u0000number of rejections which otherwise cause inefficient backtracking and\u0000prevents directed movement. We present experiments to demonstrate that DR-G-HMC\u0000(1) correctly samples from multiscale densities, (2) makes generalized HMC\u0000methods competitive with the state of the art No-U-Turn sampler, and (3) is\u0000robust to tuning parameters.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141531893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distributed model fitting refers to the process of fitting a mathematical or statistical model to the data using distributed computing resources, such that computing tasks are divided among multiple interconnected computers or nodes, often organized in a cluster or network. Most of the existing methods for distributed model fitting are to formulate it in a consensus optimization problem, and then build up algorithms based on the alternating direction method of multipliers (ADMM). This paper introduces a novel parallel framework for achieving a distributed model fitting. In contrast to previous consensus frameworks, the introduced parallel framework offers two notable advantages. Firstly, it exhibits insensitivity to sample partitioning, meaning that the solution of the algorithm remains unaffected by variations in the number of slave nodes or/and the amount of data each node carries. Secondly, fewer variables are required to be updated at each iteration, so that the proposed parallel framework performs in a more succinct and efficient way, and adapts to high-dimensional data. In addition, we prove that the algorithms under the new parallel framework have a worst-case linear convergence rate in theory. Numerical experiments confirm the generality, robustness, and accuracy of our proposed parallel framework.
{"title":"A Partition-insensitive Parallel Framework for Distributed Model Fitting","authors":"Xiaofei Wu, Rongmei Liang, Fabio Roli, Marcello Pelillo, Jing Yuan","doi":"arxiv-2406.00703","DOIUrl":"https://doi.org/arxiv-2406.00703","url":null,"abstract":"Distributed model fitting refers to the process of fitting a mathematical or\u0000statistical model to the data using distributed computing resources, such that\u0000computing tasks are divided among multiple interconnected computers or nodes,\u0000often organized in a cluster or network. Most of the existing methods for\u0000distributed model fitting are to formulate it in a consensus optimization\u0000problem, and then build up algorithms based on the alternating direction method\u0000of multipliers (ADMM). This paper introduces a novel parallel framework for\u0000achieving a distributed model fitting. In contrast to previous consensus\u0000frameworks, the introduced parallel framework offers two notable advantages.\u0000Firstly, it exhibits insensitivity to sample partitioning, meaning that the\u0000solution of the algorithm remains unaffected by variations in the number of\u0000slave nodes or/and the amount of data each node carries. Secondly, fewer\u0000variables are required to be updated at each iteration, so that the proposed\u0000parallel framework performs in a more succinct and efficient way, and adapts to\u0000high-dimensional data. In addition, we prove that the algorithms under the new\u0000parallel framework have a worst-case linear convergence rate in theory.\u0000Numerical experiments confirm the generality, robustness, and accuracy of our\u0000proposed parallel framework.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"75 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141255854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oscar Trull, Angel Peiro-Signes, J. Carlos Garcia-Diaz, Marival Segarra-Ona
The increase in travelers and stays in tourist destinations is leading hotels to be aware of their ecological management and the need for efficient energy consumption. To achieve this, hotels are increasingly using digitalized systems and more frequent measurements are made of the variables that affect their management. Electricity can play a significant role, predicting electricity usage in hotels, which in turn can enhance their circularity - an approach aimed at sustainable and efficient resource use. In this study, neural networks are trained to predict electricity usage patterns in two hotels based on historical data. The results indicate that the predictions have a good accuracy level of around 2.5% in MAPE, showing the potential of using these techniques for electricity forecasting in hotels. Additionally, neural network models can use climatological data to improve predictions. By accurately forecasting energy demand, hotels can optimize their energy procurement and usage, moving energy-intensive activities to off-peak hours to reduce costs and strain on the grid, assisting in the better integration of renewable energy sources, or identifying patterns and anomalies in energy consumption, suggesting areas for efficiency improvements, among other. Hence, by optimizing the allocation of resources, reducing waste and improving efficiency these models can improve hotel's circularity.
{"title":"Prediction of energy consumption in hotels using ANN","authors":"Oscar Trull, Angel Peiro-Signes, J. Carlos Garcia-Diaz, Marival Segarra-Ona","doi":"arxiv-2405.18076","DOIUrl":"https://doi.org/arxiv-2405.18076","url":null,"abstract":"The increase in travelers and stays in tourist destinations is leading hotels\u0000to be aware of their ecological management and the need for efficient energy\u0000consumption. To achieve this, hotels are increasingly using digitalized systems\u0000and more frequent measurements are made of the variables that affect their\u0000management. Electricity can play a significant role, predicting electricity\u0000usage in hotels, which in turn can enhance their circularity - an approach\u0000aimed at sustainable and efficient resource use. In this study, neural networks\u0000are trained to predict electricity usage patterns in two hotels based on\u0000historical data. The results indicate that the predictions have a good accuracy\u0000level of around 2.5% in MAPE, showing the potential of using these techniques\u0000for electricity forecasting in hotels. Additionally, neural network models can\u0000use climatological data to improve predictions. By accurately forecasting\u0000energy demand, hotels can optimize their energy procurement and usage, moving\u0000energy-intensive activities to off-peak hours to reduce costs and strain on the\u0000grid, assisting in the better integration of renewable energy sources, or\u0000identifying patterns and anomalies in energy consumption, suggesting areas for\u0000efficiency improvements, among other. Hence, by optimizing the allocation of\u0000resources, reducing waste and improving efficiency these models can improve\u0000hotel's circularity.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"96 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141171639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matej Benko, Iwona Chlebicka, Jørgen Endal, Błażej Miasojedow
We study the spatially homogeneous granular medium equation [partial_tmu=rm{div}(munabla V)+rm{div}(mu(nabla W ast mu))+Deltamu,,] within a large and natural class of the confinement potentials $V$ and interaction potentials $W$. The considered problem do not need to assume that $nabla V$ or $nabla W$ are globally Lipschitz. With the aim of providing particle approximation of solutions, we design efficient forward-backward splitting algorithms. Sharp convergence rates in terms of the Wasserstein distance are provided.
我们研究了空间均匀颗粒介质方程([partial_tmu=rm{div}(munabla V)+rm{div}(mu(nabla W astmu))+Deltamu,,] within a large and natural class of the confinementpotentials $V$ and interaction potentials $W$)。所考虑的问题不需要假设 $nabla V$ 或 $nabla W$ 是全局的 Lipschitz。为了提供粒子近似解,我们设计了高效的前向-后向分裂算法。我们提供了以瓦瑟斯坦距离(Wasserstein distance)表示的尖锐收敛率。
{"title":"Convergence rates of particle approximation of forward-backward splitting algorithm for granular medium equations","authors":"Matej Benko, Iwona Chlebicka, Jørgen Endal, Błażej Miasojedow","doi":"arxiv-2405.18034","DOIUrl":"https://doi.org/arxiv-2405.18034","url":null,"abstract":"We study the spatially homogeneous granular medium equation\u0000[partial_tmu=rm{div}(munabla V)+rm{div}(mu(nabla W ast\u0000mu))+Deltamu,,] within a large and natural class of the confinement\u0000potentials $V$ and interaction potentials $W$. The considered problem do not\u0000need to assume that $nabla V$ or $nabla W$ are globally Lipschitz. With the\u0000aim of providing particle approximation of solutions, we design efficient\u0000forward-backward splitting algorithms. Sharp convergence rates in terms of the\u0000Wasserstein distance are provided.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141171661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The stochastic FitzHugh-Nagumo (FHN) model considered here is a two-dimensional nonlinear stochastic differential equation with additive degenerate noise, whose first component, the only one observed, describes the membrane voltage evolution of a single neuron. Due to its low dimensionality, its analytical and numerical tractability, and its neuronal interpretation, it has been used as a case study to test the performance of different statistical methods in estimating the underlying model parameters. Existing methods, however, often require complete observations, non-degeneracy of the noise or a complex architecture (e.g., to estimate the transition density of the process, "recovering" the unobserved second component), and they may not (satisfactorily) estimate all model parameters simultaneously. Moreover, these studies lack real data applications for the stochastic FHN model. Here, we tackle all challenges (non-globally Lipschitz drift, non-explicit solution, lack of available transition density, degeneracy of the noise, and partial observations) via an intuitive and easy-to-implement sequential Monte Carlo approximate Bayesian computation algorithm. The proposed method relies on a recent computationally efficient and structure-preserving numerical splitting scheme for synthetic data generation, and on summary statistics exploiting the structural properties of the process. We succeed in estimating all model parameters from simulated data and, more remarkably, real action potential data of rats. The presented novel real-data fit may broaden the scope and credibility of this classic and widely used neuronal model.
{"title":"Inference for the stochastic FitzHugh-Nagumo model from real action potential data via approximate Bayesian computation","authors":"Adeline Samson, Massimiliano Tamborrino, Irene Tubikanec","doi":"arxiv-2405.17972","DOIUrl":"https://doi.org/arxiv-2405.17972","url":null,"abstract":"The stochastic FitzHugh-Nagumo (FHN) model considered here is a\u0000two-dimensional nonlinear stochastic differential equation with additive\u0000degenerate noise, whose first component, the only one observed, describes the\u0000membrane voltage evolution of a single neuron. Due to its low dimensionality,\u0000its analytical and numerical tractability, and its neuronal interpretation, it\u0000has been used as a case study to test the performance of different statistical\u0000methods in estimating the underlying model parameters. Existing methods,\u0000however, often require complete observations, non-degeneracy of the noise or a\u0000complex architecture (e.g., to estimate the transition density of the process,\u0000\"recovering\" the unobserved second component), and they may not\u0000(satisfactorily) estimate all model parameters simultaneously. Moreover, these\u0000studies lack real data applications for the stochastic FHN model. Here, we\u0000tackle all challenges (non-globally Lipschitz drift, non-explicit solution,\u0000lack of available transition density, degeneracy of the noise, and partial\u0000observations) via an intuitive and easy-to-implement sequential Monte Carlo\u0000approximate Bayesian computation algorithm. The proposed method relies on a\u0000recent computationally efficient and structure-preserving numerical splitting\u0000scheme for synthetic data generation, and on summary statistics exploiting the\u0000structural properties of the process. We succeed in estimating all model\u0000parameters from simulated data and, more remarkably, real action potential data\u0000of rats. The presented novel real-data fit may broaden the scope and\u0000credibility of this classic and widely used neuronal model.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141172923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This is a concise mathematical introduction to Monte Carlo methods, a rich family of algorithms with far-reaching applications in science and engineering. Monte Carlo methods are an exciting subject for mathematical statisticians and computational and applied mathematicians: the design and analysis of modern algorithms are rooted in a broad mathematical toolbox that includes ergodic theory of Markov chains, Hamiltonian dynamical systems, transport maps, stochastic differential equations, information theory, optimization, Riemannian geometry, and gradient flows, among many others. These lecture notes celebrate the breadth of mathematical ideas that have led to tangible advancements in Monte Carlo methods and their applications. To accommodate a diverse audience, the level of mathematical rigor varies from chapter to chapter, giving only an intuitive treatment to the most technically demanding subjects. The aim is not to be comprehensive or encyclopedic, but rather to illustrate some key principles in the design and analysis of Monte Carlo methods through a carefully-crafted choice of topics that emphasizes timeless over timely ideas. Algorithms are presented in a way that is conducive to conceptual understanding and mathematical analysis -- clarity and intuition are favored over state-of-the-art implementations that are harder to comprehend or rely on ad-hoc heuristics. To help readers navigate the expansive landscape of Monte Carlo methods, each algorithm is accompanied by a summary of its pros and cons, and by a discussion of the type of problems for which they are most useful. The presentation is self-contained, and therefore adequate for self-guided learning or as a teaching resource. Each chapter contains a section with bibliographic remarks that will be useful for those interested in conducting research on Monte Carlo methods and their applications.
{"title":"A First Course in Monte Carlo Methods","authors":"Daniel Sanz-Alonso, Omar Al-Ghattas","doi":"arxiv-2405.16359","DOIUrl":"https://doi.org/arxiv-2405.16359","url":null,"abstract":"This is a concise mathematical introduction to Monte Carlo methods, a rich\u0000family of algorithms with far-reaching applications in science and engineering.\u0000Monte Carlo methods are an exciting subject for mathematical statisticians and\u0000computational and applied mathematicians: the design and analysis of modern\u0000algorithms are rooted in a broad mathematical toolbox that includes ergodic\u0000theory of Markov chains, Hamiltonian dynamical systems, transport maps,\u0000stochastic differential equations, information theory, optimization, Riemannian\u0000geometry, and gradient flows, among many others. These lecture notes celebrate\u0000the breadth of mathematical ideas that have led to tangible advancements in\u0000Monte Carlo methods and their applications. To accommodate a diverse audience,\u0000the level of mathematical rigor varies from chapter to chapter, giving only an\u0000intuitive treatment to the most technically demanding subjects. The aim is not\u0000to be comprehensive or encyclopedic, but rather to illustrate some key\u0000principles in the design and analysis of Monte Carlo methods through a\u0000carefully-crafted choice of topics that emphasizes timeless over timely ideas.\u0000Algorithms are presented in a way that is conducive to conceptual understanding\u0000and mathematical analysis -- clarity and intuition are favored over\u0000state-of-the-art implementations that are harder to comprehend or rely on\u0000ad-hoc heuristics. To help readers navigate the expansive landscape of Monte\u0000Carlo methods, each algorithm is accompanied by a summary of its pros and cons,\u0000and by a discussion of the type of problems for which they are most useful. The\u0000presentation is self-contained, and therefore adequate for self-guided learning\u0000or as a teaching resource. Each chapter contains a section with bibliographic\u0000remarks that will be useful for those interested in conducting research on\u0000Monte Carlo methods and their applications.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141171638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samuel Livingstone, Nikolas Nüsken, Giorgos Vasdekis, Rui-Yang Zhang
We propose a new simple and explicit numerical scheme for time-homogeneous stochastic differential equations. The scheme is based on sampling increments at each time step from a skew-symmetric probability distribution, with the level of skewness determined by the drift and volatility of the underlying process. We show that as the step-size decreases the scheme converges weakly to the diffusion of interest. We then consider the problem of simulating from the limiting distribution of an ergodic diffusion process using the numerical scheme with a fixed step-size. We establish conditions under which the numerical scheme converges to equilibrium at a geometric rate, and quantify the bias between the equilibrium distributions of the scheme and of the true diffusion process. Notably, our results do not require a global Lipschitz assumption on the drift, in contrast to those required for the Euler--Maruyama scheme for long-time simulation at fixed step-sizes. Our weak convergence result relies on an extension of the theory of Milstein & Tretyakov to stochastic differential equations with non-Lipschitz drift, which could also be of independent interest. We support our theoretical results with numerical simulations.
{"title":"Skew-symmetric schemes for stochastic differential equations with non-Lipschitz drift: an unadjusted Barker algorithm","authors":"Samuel Livingstone, Nikolas Nüsken, Giorgos Vasdekis, Rui-Yang Zhang","doi":"arxiv-2405.14373","DOIUrl":"https://doi.org/arxiv-2405.14373","url":null,"abstract":"We propose a new simple and explicit numerical scheme for time-homogeneous\u0000stochastic differential equations. The scheme is based on sampling increments\u0000at each time step from a skew-symmetric probability distribution, with the\u0000level of skewness determined by the drift and volatility of the underlying\u0000process. We show that as the step-size decreases the scheme converges weakly to\u0000the diffusion of interest. We then consider the problem of simulating from the\u0000limiting distribution of an ergodic diffusion process using the numerical\u0000scheme with a fixed step-size. We establish conditions under which the\u0000numerical scheme converges to equilibrium at a geometric rate, and quantify the\u0000bias between the equilibrium distributions of the scheme and of the true\u0000diffusion process. Notably, our results do not require a global Lipschitz\u0000assumption on the drift, in contrast to those required for the Euler--Maruyama\u0000scheme for long-time simulation at fixed step-sizes. Our weak convergence\u0000result relies on an extension of the theory of Milstein & Tretyakov to\u0000stochastic differential equations with non-Lipschitz drift, which could also be\u0000of independent interest. We support our theoretical results with numerical\u0000simulations.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141146312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Iris Rammelmüller, Gottfried Hastermann, Jana de Wiljes
Data assimilation algorithms integrate prior information from numerical model simulations with observed data. Ensemble-based filters, regarded as state-of-the-art, are widely employed for large-scale estimation tasks in disciplines such as geoscience and meteorology. Despite their inability to produce the true posterior distribution for nonlinear systems, their robustness and capacity for state tracking are noteworthy. In contrast, Particle filters yield the correct distribution in the ensemble limit but require substantially larger ensemble sizes than ensemble-based filters to maintain stability in higher-dimensional spaces. It is essential to transcend traditional Gaussian assumptions to achieve realistic quantification of uncertainties. One approach involves the hybridisation of filters, facilitated by tempering, to harness the complementary strengths of different filters. A new adaptive tempering method is proposed to tune the underlying schedule, aiming to systematically surpass the performance previously achieved. Although promising numerical results for certain filter combinations in toy examples exist in the literature, the tuning of hyperparameters presents a considerable challenge. A deeper understanding of these interactions is crucial for practical applications.
{"title":"Adaptive tempering schedules with approximative intermediate measures for filtering problems","authors":"Iris Rammelmüller, Gottfried Hastermann, Jana de Wiljes","doi":"arxiv-2405.14408","DOIUrl":"https://doi.org/arxiv-2405.14408","url":null,"abstract":"Data assimilation algorithms integrate prior information from numerical model\u0000simulations with observed data. Ensemble-based filters, regarded as\u0000state-of-the-art, are widely employed for large-scale estimation tasks in\u0000disciplines such as geoscience and meteorology. Despite their inability to\u0000produce the true posterior distribution for nonlinear systems, their robustness\u0000and capacity for state tracking are noteworthy. In contrast, Particle filters\u0000yield the correct distribution in the ensemble limit but require substantially\u0000larger ensemble sizes than ensemble-based filters to maintain stability in\u0000higher-dimensional spaces. It is essential to transcend traditional Gaussian\u0000assumptions to achieve realistic quantification of uncertainties. One approach\u0000involves the hybridisation of filters, facilitated by tempering, to harness the\u0000complementary strengths of different filters. A new adaptive tempering method\u0000is proposed to tune the underlying schedule, aiming to systematically surpass\u0000the performance previously achieved. Although promising numerical results for\u0000certain filter combinations in toy examples exist in the literature, the tuning\u0000of hyperparameters presents a considerable challenge. A deeper understanding of\u0000these interactions is crucial for practical applications.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141146400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Congye Wang, Wilson Chen, Heishiro Kanagawa, Chris. J. Oates
An informal observation, made by several authors, is that the adaptive design of a Markov transition kernel has the flavour of a reinforcement learning task. Yet, to-date it has remained unclear how to actually exploit modern reinforcement learning technologies for adaptive MCMC. The aim of this paper is to set out a general framework, called Reinforcement Learning Metropolis--Hastings, that is theoretically supported and empirically validated. Our principal focus is on learning fast-mixing Metropolis--Hastings transition kernels, which we cast as deterministic policies and optimise via a policy gradient. Control of the learning rate provably ensures conditions for ergodicity are satisfied. The methodology is used to construct a gradient-free sampler that out-performs a popular gradient-free adaptive Metropolis--Hastings algorithm on $approx 90 %$ of tasks in the PosteriorDB benchmark.
{"title":"Reinforcement Learning for Adaptive MCMC","authors":"Congye Wang, Wilson Chen, Heishiro Kanagawa, Chris. J. Oates","doi":"arxiv-2405.13574","DOIUrl":"https://doi.org/arxiv-2405.13574","url":null,"abstract":"An informal observation, made by several authors, is that the adaptive design\u0000of a Markov transition kernel has the flavour of a reinforcement learning task.\u0000Yet, to-date it has remained unclear how to actually exploit modern\u0000reinforcement learning technologies for adaptive MCMC. The aim of this paper is\u0000to set out a general framework, called Reinforcement Learning\u0000Metropolis--Hastings, that is theoretically supported and empirically\u0000validated. Our principal focus is on learning fast-mixing Metropolis--Hastings\u0000transition kernels, which we cast as deterministic policies and optimise via a\u0000policy gradient. Control of the learning rate provably ensures conditions for\u0000ergodicity are satisfied. The methodology is used to construct a gradient-free\u0000sampler that out-performs a popular gradient-free adaptive Metropolis--Hastings\u0000algorithm on $approx 90 %$ of tasks in the PosteriorDB benchmark.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141146315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In geostatistics, traditional spatial models often rely on the Gaussian Process (GP) to fit stationary covariances to data. It is well known that this approach becomes computationally infeasible when dealing with large data volumes, necessitating the use of approximate methods. A powerful class of methods approximate the GP as a sum of basis functions with random coefficients. Although this technique offers computational efficiency, it does not inherently guarantee a stationary covariance. To mitigate this issue, the basis functions can be "normalized" to maintain a constant marginal variance, avoiding unwanted artifacts and edge effects. This allows for the fitting of nearly stationary models to large, potentially non-stationary datasets, providing a rigorous base to extend to more complex problems. Unfortunately, the process of normalizing these basis functions is computationally demanding. To address this, we introduce two fast and accurate algorithms to the normalization step, allowing for efficient prediction on fine grids. The practical value of these algorithms is showcased in the context of a spatial analysis on a large dataset, where significant computational speedups are achieved. While implementation and testing are done specifically within the LatticeKrig framework, these algorithms can be adapted to other basis function methods operating on regular grids.
在地理统计中,传统的空间模型通常依靠高斯过程(GP)来拟合数据的静态协方差。众所周知,当处理大量数据时,这种方法在计算上变得不可行,因此必须使用近似方法。有一类功能强大的方法将 GP 近似为具有随机系数的基函数之和。虽然这种技术具有计算效率高的特点,但本质上并不能保证协方差的稳定。为了缓解这一问题,可以对基值函数进行 "归一化 "处理,以保持恒定的边际方差,避免不必要的假象和边缘效应。这样就可以将接近静态的模型拟合到大型的、可能是非静态的数据集上,为扩展到更复杂的问题提供了一个严谨的基础。为了解决这个问题,我们为归一化步骤引入了两种快速而精确的算法,从而可以在精细网格上进行高效预测。在对大型数据集进行空间分析时,我们展示了这些算法的实用价值,计算速度明显加快。虽然这些算法是专门在 LatticeKrig 框架内实施和测试的,但它们也可适用于在常规网格上运行的其他基函数方法。
{"title":"Normalizing Basis Functions: Approximate Stationary Models for Large Spatial Data","authors":"Antony Sikorski, Daniel McKenzie, Douglas Nychka","doi":"arxiv-2405.13821","DOIUrl":"https://doi.org/arxiv-2405.13821","url":null,"abstract":"In geostatistics, traditional spatial models often rely on the Gaussian\u0000Process (GP) to fit stationary covariances to data. It is well known that this\u0000approach becomes computationally infeasible when dealing with large data\u0000volumes, necessitating the use of approximate methods. A powerful class of\u0000methods approximate the GP as a sum of basis functions with random\u0000coefficients. Although this technique offers computational efficiency, it does\u0000not inherently guarantee a stationary covariance. To mitigate this issue, the\u0000basis functions can be \"normalized\" to maintain a constant marginal variance,\u0000avoiding unwanted artifacts and edge effects. This allows for the fitting of\u0000nearly stationary models to large, potentially non-stationary datasets,\u0000providing a rigorous base to extend to more complex problems. Unfortunately,\u0000the process of normalizing these basis functions is computationally demanding.\u0000To address this, we introduce two fast and accurate algorithms to the\u0000normalization step, allowing for efficient prediction on fine grids. The\u0000practical value of these algorithms is showcased in the context of a spatial\u0000analysis on a large dataset, where significant computational speedups are\u0000achieved. While implementation and testing are done specifically within the\u0000LatticeKrig framework, these algorithms can be adapted to other basis function\u0000methods operating on regular grids.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141146317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}