Thomas Gaskin, Marie-Therese Wolfram, Andrew Duncan, Guven Demirel
Global trade is shaped by a complex mix of factors beyond supply and demand, including tangible variables like transport costs and tariffs, as well as less quantifiable influences such as political and economic relations. Traditionally, economists model trade using gravity models, which rely on explicit covariates but often struggle to capture these subtler drivers of trade. In this work, we employ optimal transport and a deep neural network to learn a time-dependent cost function from data, without imposing a specific functional form. This approach consistently outperforms traditional gravity models in accuracy while providing natural uncertainty quantification. Applying our framework to global food and agricultural trade, we show that the global South suffered disproportionately from the war in Ukraine's impact on wheat markets. We also analyze the effects of free-trade agreements and trade disputes with China, as well as Brexit's impact on British trade with Europe, uncovering hidden patterns that trade volumes alone cannot reveal.
{"title":"Modelling Global Trade with Optimal Transport","authors":"Thomas Gaskin, Marie-Therese Wolfram, Andrew Duncan, Guven Demirel","doi":"arxiv-2409.06554","DOIUrl":"https://doi.org/arxiv-2409.06554","url":null,"abstract":"Global trade is shaped by a complex mix of factors beyond supply and demand,\u0000including tangible variables like transport costs and tariffs, as well as less\u0000quantifiable influences such as political and economic relations.\u0000Traditionally, economists model trade using gravity models, which rely on\u0000explicit covariates but often struggle to capture these subtler drivers of\u0000trade. In this work, we employ optimal transport and a deep neural network to\u0000learn a time-dependent cost function from data, without imposing a specific\u0000functional form. This approach consistently outperforms traditional gravity\u0000models in accuracy while providing natural uncertainty quantification. Applying\u0000our framework to global food and agricultural trade, we show that the global\u0000South suffered disproportionately from the war in Ukraine's impact on wheat\u0000markets. We also analyze the effects of free-trade agreements and trade\u0000disputes with China, as well as Brexit's impact on British trade with Europe,\u0000uncovering hidden patterns that trade volumes alone cannot reveal.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The quasi-potential is a key concept in stochastic systems as it accounts for the long-term behavior of the dynamics of such systems. It also allows us to estimate mean exit times from the attractors of the system, and transition rates between states. This is of significance in many applications across various areas such as physics, biology, ecology, and economy. Computation of the quasi-potential is often obtained via a functional minimization problem that can be challenging. This paper combines a sparse learning technique with action minimization methods in order to: (i) Identify the orthogonal decomposition of the deterministic vector field (drift) driving the stochastic dynamics; (ii) Determine the quasi-potential from this decomposition. This decomposition of the drift vector field into its gradient and orthogonal parts is accomplished with the help of a machine learning-based sparse identification technique. Specifically, the so-called sparse identification of non-linear dynamics (SINDy) [1] is applied to the most likely trajectory in a stochastic system (instanton) to learn the orthogonal decomposition of the drift. Consequently, the quasi-potential can be evaluated even at points outside the instanton path, allowing our method to provide the complete quasi-potential landscape from this single trajectory. Additionally, the orthogonal drift component obtained within our framework is important as a correction to the exponential decay of transition rates and exit times. We implemented the proposed approach in 2- and 3-D systems, covering various types of potential landscapes and attractors.
{"title":"Quasi-potential and drift decomposition in stochastic systems by sparse identification","authors":"Leonardo Grigorio, Mnerh Alqahtani","doi":"arxiv-2409.06886","DOIUrl":"https://doi.org/arxiv-2409.06886","url":null,"abstract":"The quasi-potential is a key concept in stochastic systems as it accounts for\u0000the long-term behavior of the dynamics of such systems. It also allows us to\u0000estimate mean exit times from the attractors of the system, and transition\u0000rates between states. This is of significance in many applications across\u0000various areas such as physics, biology, ecology, and economy. Computation of\u0000the quasi-potential is often obtained via a functional minimization problem\u0000that can be challenging. This paper combines a sparse learning technique with\u0000action minimization methods in order to: (i) Identify the orthogonal\u0000decomposition of the deterministic vector field (drift) driving the stochastic\u0000dynamics; (ii) Determine the quasi-potential from this decomposition. This\u0000decomposition of the drift vector field into its gradient and orthogonal parts\u0000is accomplished with the help of a machine learning-based sparse identification\u0000technique. Specifically, the so-called sparse identification of non-linear\u0000dynamics (SINDy) [1] is applied to the most likely trajectory in a stochastic\u0000system (instanton) to learn the orthogonal decomposition of the drift.\u0000Consequently, the quasi-potential can be evaluated even at points outside the\u0000instanton path, allowing our method to provide the complete quasi-potential\u0000landscape from this single trajectory. Additionally, the orthogonal drift\u0000component obtained within our framework is important as a correction to the\u0000exponential decay of transition rates and exit times. We implemented the\u0000proposed approach in 2- and 3-D systems, covering various types of potential\u0000landscapes and attractors.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seismic inversion is essential for geophysical exploration and geological assessment, but it is inherently subject to significant uncertainty. This uncertainty stems primarily from the limited information provided by observed seismic data, which is largely a result of constraints in data collection geometry. As a result, multiple plausible velocity models can often explain the same set of seismic observations. In deep learning-based seismic inversion, uncertainty arises from various sources, including data noise, neural network design and training, and inherent data limitations. This study introduces a novel approach to uncertainty quantification in seismic inversion by integrating ensemble methods with importance sampling. By leveraging ensemble approach in combination with importance sampling, we enhance the accuracy of uncertainty analysis while maintaining computational efficiency. The method involves initializing each model in the ensemble with different weights, introducing diversity in predictions and thereby improving the robustness and reliability of the inversion outcomes. Additionally, the use of importance sampling weights the contribution of each ensemble sample, allowing us to use a limited number of ensemble samples to obtain more accurate estimates of the posterior distribution. Our approach enables more precise quantification of uncertainty in velocity models derived from seismic data. By utilizing a limited number of ensemble samples, this method achieves an accurate and reliable assessment of uncertainty, ultimately providing greater confidence in seismic inversion results.
{"title":"Uncertainty Quantification in Seismic Inversion Through Integrated Importance Sampling and Ensemble Methods","authors":"Luping Qu, Mauricio Araya-Polo, Laurent Demanet","doi":"arxiv-2409.06840","DOIUrl":"https://doi.org/arxiv-2409.06840","url":null,"abstract":"Seismic inversion is essential for geophysical exploration and geological\u0000assessment, but it is inherently subject to significant uncertainty. This\u0000uncertainty stems primarily from the limited information provided by observed\u0000seismic data, which is largely a result of constraints in data collection\u0000geometry. As a result, multiple plausible velocity models can often explain the\u0000same set of seismic observations. In deep learning-based seismic inversion,\u0000uncertainty arises from various sources, including data noise, neural network\u0000design and training, and inherent data limitations. This study introduces a\u0000novel approach to uncertainty quantification in seismic inversion by\u0000integrating ensemble methods with importance sampling. By leveraging ensemble\u0000approach in combination with importance sampling, we enhance the accuracy of\u0000uncertainty analysis while maintaining computational efficiency. The method\u0000involves initializing each model in the ensemble with different weights,\u0000introducing diversity in predictions and thereby improving the robustness and\u0000reliability of the inversion outcomes. Additionally, the use of importance\u0000sampling weights the contribution of each ensemble sample, allowing us to use a\u0000limited number of ensemble samples to obtain more accurate estimates of the\u0000posterior distribution. Our approach enables more precise quantification of\u0000uncertainty in velocity models derived from seismic data. By utilizing a\u0000limited number of ensemble samples, this method achieves an accurate and\u0000reliable assessment of uncertainty, ultimately providing greater confidence in\u0000seismic inversion results.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"203 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Scheuerer, Claudio Heinrich-Mertsching, Titike K. Bahaga, Masilin Gudoshava, Thordis L. Thorarinsdottir
Seasonal climate forecasts are commonly based on model runs from fully coupled forecasting systems that use Earth system models to represent interactions between the atmosphere, ocean, land and other Earth-system components. Recently, machine learning (ML) methods are increasingly being investigated for this task where large-scale climate variability is linked to local or regional temperature or precipitation in a linear or non-linear fashion. This paper investigates the use of interpretable ML methods to predict seasonal precipitation for East Africa in an operational setting. Dimension reduction is performed by decomposing the precipitation fields via empirical orthogonal functions (EOFs), such that only the respective factor loadings need to the predicted. Indices of large-scale climate variability--including the rate of change in individual indices as well as interactions between different indices--are then used as potential features to obtain tercile forecasts from an interpretable ML algorithm. Several research questions regarding the use of data and the effect of model complexity are studied. The results are compared against the ECMWF seasonal forecasting system (SEAS5) for three seasons--MAM, JJAS and OND--over the period 1993-2020. Compared to climatology for the same period, the ECMWF forecasts have negative skill in MAM and JJAS and significant positive skill in OND. The ML approach is on par with climatology in MAM and JJAS and a significantly positive skill in OND, if not quite at the level of the OND ECMWF forecast.
季节性气候预报通常基于全耦合预报系统的模型运行,这些系统使用地球系统模型来表示大气、海洋、陆地和其他地球系统组成部分之间的相互作用。最近,机器学习(ML)方法越来越多地被用于这项任务,在这项任务中,大尺度气候变率以线性或非线性方式与本地或区域温度或降水相关联。本文研究了如何使用可解释的 ML 方法来预测东非的季节性降水量。通过经验正交函数(EOFs)对降水场进行分解,从而只需预测各自的因子载荷。然后将大尺度气候变异性指数(包括单个指数的变化率以及不同指数之间的相互作用)作为潜在特征,通过可解释的 ML 算法获得三元预报。研究了有关数据使用和模型复杂性影响的几个研究问题。研究结果与 ECMWF 季节预报系统(SEAS5)进行了比较,包括 1993-2020 年间的三个季节--MAM、JJAS 和 OND。与同期气候学相比,ECMWF的预报在MAM和JJAS中的技能为负,在OND中的技能为显著的正。ML方法在MAM和JJAS方面与气候学相近,在OND方面具有显著的正技能,尽管还没有达到ECMWF预测的OND水平。
{"title":"Applications of machine learning to predict seasonal precipitation for East Africa","authors":"Michael Scheuerer, Claudio Heinrich-Mertsching, Titike K. Bahaga, Masilin Gudoshava, Thordis L. Thorarinsdottir","doi":"arxiv-2409.06238","DOIUrl":"https://doi.org/arxiv-2409.06238","url":null,"abstract":"Seasonal climate forecasts are commonly based on model runs from fully\u0000coupled forecasting systems that use Earth system models to represent\u0000interactions between the atmosphere, ocean, land and other Earth-system\u0000components. Recently, machine learning (ML) methods are increasingly being\u0000investigated for this task where large-scale climate variability is linked to\u0000local or regional temperature or precipitation in a linear or non-linear\u0000fashion. This paper investigates the use of interpretable ML methods to predict\u0000seasonal precipitation for East Africa in an operational setting. Dimension\u0000reduction is performed by decomposing the precipitation fields via empirical\u0000orthogonal functions (EOFs), such that only the respective factor loadings need\u0000to the predicted. Indices of large-scale climate variability--including the\u0000rate of change in individual indices as well as interactions between different\u0000indices--are then used as potential features to obtain tercile forecasts from\u0000an interpretable ML algorithm. Several research questions regarding the use of\u0000data and the effect of model complexity are studied. The results are compared\u0000against the ECMWF seasonal forecasting system (SEAS5) for three seasons--MAM,\u0000JJAS and OND--over the period 1993-2020. Compared to climatology for the same\u0000period, the ECMWF forecasts have negative skill in MAM and JJAS and significant\u0000positive skill in OND. The ML approach is on par with climatology in MAM and\u0000JJAS and a significantly positive skill in OND, if not quite at the level of\u0000the OND ECMWF forecast.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ensemble methods such as random forests have transformed the landscape of supervised learning, offering highly accurate prediction through the aggregation of multiple weak learners. However, despite their effectiveness, these methods often lack transparency, impeding users' comprehension of how RF models arrive at their predictions. Explainable ensemble trees (E2Tree) is a novel methodology for explaining random forests, that provides a graphical representation of the relationship between response variables and predictors. A striking characteristic of E2Tree is that it not only accounts for the effects of predictor variables on the response but also accounts for associations between the predictor variables through the computation and use of dissimilarity measures. The E2Tree methodology was initially proposed for use in classification tasks. In this paper, we extend the methodology to encompass regression contexts. To demonstrate the explanatory power of the proposed algorithm, we illustrate its use on real-world datasets.
{"title":"Extending Explainable Ensemble Trees (E2Tree) to regression contexts","authors":"Massimo Aria, Agostino Gnasso, Carmela Iorio, Marjolein Fokkema","doi":"arxiv-2409.06439","DOIUrl":"https://doi.org/arxiv-2409.06439","url":null,"abstract":"Ensemble methods such as random forests have transformed the landscape of\u0000supervised learning, offering highly accurate prediction through the\u0000aggregation of multiple weak learners. However, despite their effectiveness,\u0000these methods often lack transparency, impeding users' comprehension of how RF\u0000models arrive at their predictions. Explainable ensemble trees (E2Tree) is a\u0000novel methodology for explaining random forests, that provides a graphical\u0000representation of the relationship between response variables and predictors. A\u0000striking characteristic of E2Tree is that it not only accounts for the effects\u0000of predictor variables on the response but also accounts for associations\u0000between the predictor variables through the computation and use of\u0000dissimilarity measures. The E2Tree methodology was initially proposed for use\u0000in classification tasks. In this paper, we extend the methodology to encompass\u0000regression contexts. To demonstrate the explanatory power of the proposed\u0000algorithm, we illustrate its use on real-world datasets.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel A. Messenger, April Tran, Vanja Dukic, David M. Bortz
The weak form is a ubiquitous, well-studied, and widely-utilized mathematical tool in modern computational and applied mathematics. In this work we provide a survey of both the history and recent developments for several fields in which the weak form can play a critical role. In particular, we highlight several recent advances in weak form versions of equation learning, parameter estimation, and coarse graining, which offer surprising noise robustness, accuracy, and computational efficiency. We note that this manuscript is a companion piece to our October 2024 SIAM News article of the same name. Here we provide more detailed explanations of mathematical developments as well as a more complete list of references. Lastly, we note that the software with which to reproduce the results in this manuscript is also available on our group's GitHub website https://github.com/MathBioCU .
{"title":"The Weak Form Is Stronger Than You Think","authors":"Daniel A. Messenger, April Tran, Vanja Dukic, David M. Bortz","doi":"arxiv-2409.06751","DOIUrl":"https://doi.org/arxiv-2409.06751","url":null,"abstract":"The weak form is a ubiquitous, well-studied, and widely-utilized mathematical\u0000tool in modern computational and applied mathematics. In this work we provide a\u0000survey of both the history and recent developments for several fields in which\u0000the weak form can play a critical role. In particular, we highlight several\u0000recent advances in weak form versions of equation learning, parameter\u0000estimation, and coarse graining, which offer surprising noise robustness,\u0000accuracy, and computational efficiency. We note that this manuscript is a companion piece to our October 2024 SIAM\u0000News article of the same name. Here we provide more detailed explanations of\u0000mathematical developments as well as a more complete list of references.\u0000Lastly, we note that the software with which to reproduce the results in this\u0000manuscript is also available on our group's GitHub website\u0000https://github.com/MathBioCU .","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel M. Steinberg, Rafael Oliveira, Cheng Soon Ong, Edwin V. Bonilla
We develop variational search distributions (VSD), a method for finding discrete, combinatorial designs of a rare desired class in a batch sequential manner with a fixed experimental budget. We formalize the requirements and desiderata for this problem and formulate a solution via variational inference that fulfill these. In particular, VSD uses off-the-shelf gradient based optimization routines, and can take advantage of scalable predictive models. We show that VSD can outperform existing baseline methods on a set of real sequence-design problems in various biological systems.
{"title":"Variational Search Distributions","authors":"Daniel M. Steinberg, Rafael Oliveira, Cheng Soon Ong, Edwin V. Bonilla","doi":"arxiv-2409.06142","DOIUrl":"https://doi.org/arxiv-2409.06142","url":null,"abstract":"We develop variational search distributions (VSD), a method for finding\u0000discrete, combinatorial designs of a rare desired class in a batch sequential\u0000manner with a fixed experimental budget. We formalize the requirements and\u0000desiderata for this problem and formulate a solution via variational inference\u0000that fulfill these. In particular, VSD uses off-the-shelf gradient based\u0000optimization routines, and can take advantage of scalable predictive models. We\u0000show that VSD can outperform existing baseline methods on a set of real\u0000sequence-design problems in various biological systems.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Network inference, the task of reconstructing interactions in a complex system from experimental observables, is a central yet extremely challenging problem in systems biology. While much progress has been made in the last two decades, network inference remains an open problem. For systems observed at steady state, limited insights are available since temporal information is unavailable and thus causal information is lost. Two common avenues for gaining causal insights into system behaviour are to leverage temporal dynamics in the form of trajectories, and to apply interventions such as knock-out perturbations. We propose an approach for leveraging both dynamical and perturbational single cell data to jointly learn cellular trajectories and power network inference. Our approach is motivated by min-entropy estimation for stochastic dynamics and can infer directed and signed networks from time-stamped single cell snapshots.
{"title":"Joint trajectory and network inference via reference fitting","authors":"Stephen Y Zhang","doi":"arxiv-2409.06879","DOIUrl":"https://doi.org/arxiv-2409.06879","url":null,"abstract":"Network inference, the task of reconstructing interactions in a complex\u0000system from experimental observables, is a central yet extremely challenging\u0000problem in systems biology. While much progress has been made in the last two\u0000decades, network inference remains an open problem. For systems observed at\u0000steady state, limited insights are available since temporal information is\u0000unavailable and thus causal information is lost. Two common avenues for gaining\u0000causal insights into system behaviour are to leverage temporal dynamics in the\u0000form of trajectories, and to apply interventions such as knock-out\u0000perturbations. We propose an approach for leveraging both dynamical and\u0000perturbational single cell data to jointly learn cellular trajectories and\u0000power network inference. Our approach is motivated by min-entropy estimation\u0000for stochastic dynamics and can infer directed and signed networks from\u0000time-stamped single cell snapshots.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"66 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alex Glyn-Davies, Arnaud Vadeboncoeur, O. Deniz Akyildiz, Ieva Kazlauskaite, Mark Girolami
Variational inference (VI) is a computationally efficient and scalable methodology for approximate Bayesian inference. It strikes a balance between accuracy of uncertainty quantification and practical tractability. It excels at generative modelling and inversion tasks due to its built-in Bayesian regularisation and flexibility, essential qualities for physics related problems. Deriving the central learning objective for VI must often be tailored to new learning tasks where the nature of the problems dictates the conditional dependence between variables of interest, such as arising in physics problems. In this paper, we provide an accessible and thorough technical introduction to VI for forward and inverse problems, guiding the reader through standard derivations of the VI framework and how it can best be realized through deep learning. We then review and unify recent literature exemplifying the creative flexibility allowed by VI. This paper is designed for a general scientific audience looking to solve physics-based problems with an emphasis on uncertainty quantification.
变量推理(Variational inference,VI)是一种计算效率高、可扩展的近似贝叶斯推理方法。它在不确定性量化的准确性和实用性之间取得了平衡。由于其内置的贝叶斯规则化和灵活性,它在生成建模和反演任务中表现出色,这些都是物理相关问题的基本特征。在本文中,我们针对正演和反演问题对 VI 进行了深入浅出的技术介绍,引导读者了解 VI 框架的标准衍生,以及如何通过深度学习最好地实现 VI。然后,我们回顾并统一了最近的文献,这些文献体现了 VI 所允许的创造性灵活性。本文面向希望解决物理问题的普通科学读者,重点关注不确定性量化。
{"title":"A Primer on Variational Inference for Physics-Informed Deep Generative Modelling","authors":"Alex Glyn-Davies, Arnaud Vadeboncoeur, O. Deniz Akyildiz, Ieva Kazlauskaite, Mark Girolami","doi":"arxiv-2409.06560","DOIUrl":"https://doi.org/arxiv-2409.06560","url":null,"abstract":"Variational inference (VI) is a computationally efficient and scalable\u0000methodology for approximate Bayesian inference. It strikes a balance between\u0000accuracy of uncertainty quantification and practical tractability. It excels at\u0000generative modelling and inversion tasks due to its built-in Bayesian\u0000regularisation and flexibility, essential qualities for physics related\u0000problems. Deriving the central learning objective for VI must often be tailored\u0000to new learning tasks where the nature of the problems dictates the conditional\u0000dependence between variables of interest, such as arising in physics problems.\u0000In this paper, we provide an accessible and thorough technical introduction to\u0000VI for forward and inverse problems, guiding the reader through standard\u0000derivations of the VI framework and how it can best be realized through deep\u0000learning. We then review and unify recent literature exemplifying the creative\u0000flexibility allowed by VI. This paper is designed for a general scientific\u0000audience looking to solve physics-based problems with an emphasis on\u0000uncertainty quantification.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements in $mathbb{R}^d$, where $dge1,$ and $M$ classes, thereby ensuring accurate classification. By modeling the neural network as a time-discrete nonlinear dynamical system, we interpret the memorization property as a problem of simultaneous or ensemble controllability. This problem is addressed by constructing the network parameters inductively and explicitly, bypassing the need for training or solving any optimization problem. Additionally, we establish that such a network can achieve universal approximation in $L^p(Omega;mathbb{R}_+)$, where $Omega$ is a bounded subset of $mathbb{R}^d$ and $pin[1,infty)$, using a ReLU deep neural network with a width of $d+1$. We also provide depth estimates for approximating $W^{1,p}$ functions and width estimates for approximating $L^p(Omega;mathbb{R}^m)$ for $mgeq1$. Our proofs are constructive, offering explicit values for the biases and weights involved.
{"title":"Deep Neural Networks: Multi-Classification and Universal Approximation","authors":"Martín Hernández, Enrique Zuazua","doi":"arxiv-2409.06555","DOIUrl":"https://doi.org/arxiv-2409.06555","url":null,"abstract":"We demonstrate that a ReLU deep neural network with a width of $2$ and a\u0000depth of $2N+4M-1$ layers can achieve finite sample memorization for any\u0000dataset comprising $N$ elements in $mathbb{R}^d$, where $dge1,$ and $M$\u0000classes, thereby ensuring accurate classification. By modeling the neural network as a time-discrete nonlinear dynamical system,\u0000we interpret the memorization property as a problem of simultaneous or ensemble\u0000controllability. This problem is addressed by constructing the network\u0000parameters inductively and explicitly, bypassing the need for training or\u0000solving any optimization problem. Additionally, we establish that such a network can achieve universal\u0000approximation in $L^p(Omega;mathbb{R}_+)$, where $Omega$ is a bounded subset\u0000of $mathbb{R}^d$ and $pin[1,infty)$, using a ReLU deep neural network with a\u0000width of $d+1$. We also provide depth estimates for approximating $W^{1,p}$\u0000functions and width estimates for approximating $L^p(Omega;mathbb{R}^m)$ for\u0000$mgeq1$. Our proofs are constructive, offering explicit values for the biases\u0000and weights involved.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}