Foundations of data science (Springfield, Mo.)最新文献

英文中文

Probabilistic learning on manifolds 流形上的概率学习

Q2 MATHEMATICS, APPLIED

Foundations of data science (Springfield, Mo.)

Pub Date : 2020-02-28 DOI: 10.3934/fods.2020013

Christian Soize, R. Ghanem

This paper presents mathematical results in support of the methodology of the probabilistic learning on manifolds (PLoM) recently introduced by the authors, which has been used with success for analyzing complex engineering systems. The PLoM considers a given initial dataset constituted of a small number of points given in an Euclidean space, which are interpreted as independent realizations of a vector-valued random variable for which its non-Gaussian probability measure is unknown but is, textit{a priori}, concentrated in an unknown subset of the Euclidean space. The objective is to construct a learned dataset constituted of additional realizations that allow the evaluation of converged statistics. A transport of the probability measure estimated with the initial dataset is done through a linear transformation constructed using a reduced-order diffusion-maps basis. In this paper, it is proven that this transported measure is a marginal distribution of the invariant measure of a reduced-order Ito stochastic differential equation that corresponds to a dissipative Hamiltonian dynamical system. This construction allows for preserving the concentration of the probability measure. This property is shown by analyzing a distance between the random matrix constructed with the PLoM and the matrix representing the initial dataset, as a function of the dimension of the basis. It is further proven that this distance has a minimum for a dimension of the reduced-order diffusion-maps basis that is strictly smaller than the number of points in the initial dataset. Finally, a brief numerical application illustrates the mathematical results.

本文给出了支持作者最近提出的流形概率学习(PLoM)方法的数学结果，该方法已成功地用于分析复杂工程系统。PLoM考虑一个给定的初始数据集，该数据集由欧几里得空间中给定的少量点组成，这些点被解释为向量值随机变量的独立实现，其非高斯概率测度是未知的，但textit{先验}地集中在欧几里得空间的未知子集中。目标是构建一个由其他实现组成的学习数据集，这些实现允许对聚合统计进行评估。用初始数据集估计的概率测度的传输是通过使用降阶扩散映射基础构造的线性变换来完成的。本文证明了该传递测度是对应于耗散哈密顿动力系统的降阶Ito随机微分方程不变测度的一个边际分布。这种构造允许保持概率测度的集中。通过分析用PLoM构造的随机矩阵与表示初始数据集的矩阵之间的距离作为基维数的函数来显示这一特性。进一步证明，对于降阶扩散映射基的一个维，该距离有一个最小值，该最小值严格小于初始数据集中的点数。最后，通过一个简单的数值应用说明了数学结果。

{"title":"Probabilistic learning on manifolds","authors":"Christian Soize, R. Ghanem","doi":"10.3934/fods.2020013","DOIUrl":"https://doi.org/10.3934/fods.2020013","url":null,"abstract":"This paper presents mathematical results in support of the methodology of the probabilistic learning on manifolds (PLoM) recently introduced by the authors, which has been used with success for analyzing complex engineering systems. The PLoM considers a given initial dataset constituted of a small number of points given in an Euclidean space, which are interpreted as independent realizations of a vector-valued random variable for which its non-Gaussian probability measure is unknown but is, textit{a priori}, concentrated in an unknown subset of the Euclidean space. The objective is to construct a learned dataset constituted of additional realizations that allow the evaluation of converged statistics. A transport of the probability measure estimated with the initial dataset is done through a linear transformation constructed using a reduced-order diffusion-maps basis. In this paper, it is proven that this transported measure is a marginal distribution of the invariant measure of a reduced-order Ito stochastic differential equation that corresponds to a dissipative Hamiltonian dynamical system. This construction allows for preserving the concentration of the probability measure. This property is shown by analyzing a distance between the random matrix constructed with the PLoM and the matrix representing the initial dataset, as a function of the dimension of the basis. It is further proven that this distance has a minimum for a dimension of the reduced-order diffusion-maps basis that is strictly smaller than the number of points in the initial dataset. Finally, a brief numerical application illustrates the mathematical results.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44044177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Multilevel Ensemble Kalman Filtering based on a sample average of independent EnKF estimators 基于独立EnKF估计的样本平均值的多水平集成卡尔曼滤波

Q2 MATHEMATICS, APPLIED

Foundations of data science (Springfield, Mo.)

Pub Date : 2020-02-02 DOI: 10.3934/fods.2020017

Håkon Hoel, G. Shaimerdenova, R. Tempone

We introduce a new multilevel ensemble Kalman filter method (MLEnKF) which consists of a hierarchy of independent samples of ensemble Kalman filters (EnKF). This new MLEnKF method is fundamentally different from the preexisting method introduced by Hoel, Law and Tempone in 2016, and it is suitable for extensions towards multi-index Monte Carlo based filtering methods. Robust theoretical analysis and supporting numerical examples show that under appropriate regularity assumptions, the MLEnKF method has better complexity than plain vanilla EnKF in the large-ensemble and fine-resolution limits, for weak approximations of quantities of interest. The method is developed for discrete-time filtering problems with finite-dimensional state space and linear observations polluted by additive Gaussian noise.

我们介绍了一种新的多级集成卡尔曼滤波器方法（MLEnKF），该方法由集成卡尔曼滤波器的独立样本层次组成。这种新的MLEnKF方法与Hoel、Law和Tempone在2016年引入的现有方法有根本不同，它适用于向基于多指标蒙特卡罗的滤波方法扩展。稳健的理论分析和支持的数值例子表明，在适当的正则性假设下，对于感兴趣的量的弱近似，MLEnKF方法在大系综和精细分辨率极限方面比普通EnKF具有更好的复杂性。该方法是针对有限维状态空间和线性观测受到加性高斯噪声污染的离散时间滤波问题而开发的。

引用次数: 13

Index 指数

Q2 MATHEMATICS, APPLIED

Foundations of data science (Springfield, Mo.)

Pub Date : 2020-01-31 DOI: 10.1017/9781108755528.013

引用次数: 0

Introduction 介绍

Q2 MATHEMATICS, APPLIED

Foundations of data science (Springfield, Mo.)

Pub Date : 2020-01-31 DOI: 10.1017/9781108755528.001

引用次数: 0

High-Dimensional Space 高维空间

Q2 MATHEMATICS, APPLIED

Foundations of data science (Springfield, Mo.)

Pub Date : 2020-01-31 DOI: 10.1017/9781108755528.002

引用次数: 4

Topic Models, Nonnegative Matrix Factorization, Hidden Markov Models, and Graphical Models 主题模型，非负矩阵分解，隐马尔可夫模型和图形模型

Q2 MATHEMATICS, APPLIED

Foundations of data science (Springfield, Mo.)

Pub Date : 2020-01-31 DOI: 10.1017/9781108755528.009

引用次数: 0

Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization 融合数据同化、机器学习和期望最大化的混沌动力学贝叶斯推理

Q2 MATHEMATICS, APPLIED

Foundations of data science (Springfield, Mo.)

Pub Date : 2020-01-17 DOI: 10.3934/fods.2020004

M. Bocquet, J. Brajard, A. Carrassi, Laurent Bertino

The reconstruction from observations of high-dimensional chaotic dynamics such as geophysical flows is hampered by (i) the partial and noisy observations that can realistically be obtained, (ii) the need to learn from long time series of data, and (iii) the unstable nature of the dynamics. To achieve such inference from the observations over long time series, it has been suggested to combine data assimilation and machine learning in several ways. We show how to unify these approaches from a Bayesian perspective using expectation-maximization and coordinate descents. In doing so, the model, the state trajectory and model error statistics are estimated all together. Implementations and approximations of these methods are discussed. Finally, we numerically and successfully test the approach on two relevant low-order chaotic models with distinct identifiability.

从高维混沌动力学（如地球物理流）的观测重建受到以下阻碍：（i）可以实际获得的部分和有噪声的观测，（ii）需要从长时间序列的数据中学习，以及（iii）动力学的不稳定性质。为了从长时间序列的观测中实现这种推断，有人建议以几种方式将数据同化和机器学习相结合。我们展示了如何从贝叶斯的角度使用期望最大化和坐标下降来统一这些方法。在这样做的过程中，模型、状态轨迹和模型误差统计信息被一起估计。讨论了这些方法的实现和近似。最后，我们在两个具有不同可识别性的相关低阶混沌模型上成功地对该方法进行了数值测试。

引用次数: 75

Mean-field and kinetic descriptions of neural differential equations 神经微分方程的平均场和动力学描述

Q2 MATHEMATICS, APPLIED

Foundations of data science (Springfield, Mo.)

Pub Date : 2020-01-07 DOI: 10.3934/fods.2022007

M. Herty, T. Trimborn, G. Visconti

Nowadays, neural networks are widely used in many applications as artificial intelligence models for learning tasks. Since typically neural networks process a very large amount of data, it is convenient to formulate them within the mean-field and kinetic theory. In this work we focus on a particular class of neural networks, i.e. the residual neural networks, assuming that each layer is characterized by the same number of neurons begin{document}$ N $end{document}, which is fixed by the dimension of the data. This assumption allows to interpret the residual neural network as a time-discretized ordinary differential equation, in analogy with neural differential equations. The mean-field description is then obtained in the limit of infinitely many input data. This leads to a Vlasov-type partial differential equation which describes the evolution of the distribution of the input data. We analyze steady states and sensitivity with respect to the parameters of the network, namely the weights and the bias. In the simple setting of a linear activation function and one-dimensional input data, the study of the moments provides insights on the choice of the parameters of the network. Furthermore, a modification of the microscopic dynamics, inspired by stochastic residual neural networks, leads to a Fokker-Planck formulation of the network, in which the concept of network training is replaced by the task of fitting distributions. The performed analysis is validated by artificial numerical simulations. In particular, results on classification and regression problems are presented.

如今，神经网络作为学习任务的人工智能模型被广泛应用于许多应用中。由于神经网络通常处理大量数据，因此在平均场和动力学理论中对其进行公式化是很方便的。在这项工作中，我们专注于一类特定的神经网络，即残差神经网络，假设每一层都由相同数量的神经元开始｛文档｝$N$结束｛文档}表征，这是由数据的维度固定的。这一假设允许将残差神经网络解释为时间离散常微分方程，类似于神经微分方程。然后在无限多个输入数据的限制下获得平均场描述。这导致了描述输入数据分布演变的Vlasov型偏微分方程。我们分析了网络参数的稳态和灵敏度，即权重和偏差。在线性激活函数和一维输入数据的简单设置中，矩的研究为网络参数的选择提供了见解。此外，受随机残差神经网络的启发，对微观动力学进行了修改，得出了网络的福克-普朗克公式，其中网络训练的概念被拟合分布的任务所取代。通过人工数值模拟验证了所进行的分析。特别地，给出了关于分类和回归问题的结果。

{"title":"Mean-field and kinetic descriptions of neural differential equations","authors":"M. Herty, T. Trimborn, G. Visconti","doi":"10.3934/fods.2022007","DOIUrl":"https://doi.org/10.3934/fods.2022007","url":null,"abstract":"Nowadays, neural networks are widely used in many applications as artificial intelligence models for learning tasks. Since typically neural networks process a very large amount of data, it is convenient to formulate them within the mean-field and kinetic theory. In this work we focus on a particular class of neural networks, i.e. the residual neural networks, assuming that each layer is characterized by the same number of neurons begin{document}$ N $end{document}, which is fixed by the dimension of the data. This assumption allows to interpret the residual neural network as a time-discretized ordinary differential equation, in analogy with neural differential equations. The mean-field description is then obtained in the limit of infinitely many input data. This leads to a Vlasov-type partial differential equation which describes the evolution of the distribution of the input data. We analyze steady states and sensitivity with respect to the parameters of the network, namely the weights and the bias. In the simple setting of a linear activation function and one-dimensional input data, the study of the moments provides insights on the choice of the parameters of the network. Furthermore, a modification of the microscopic dynamics, inspired by stochastic residual neural networks, leads to a Fokker-Planck formulation of the network, in which the concept of network training is replaced by the task of fitting distributions. The performed analysis is validated by artificial numerical simulations. In particular, results on classification and regression problems are presented.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42109967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Topological reconstruction of sub-cellular motion with Ensemble Kalman velocimetry 基于集合卡尔曼速度法的亚细胞运动拓扑重建

Q2 MATHEMATICS, APPLIED

Foundations of data science (Springfield, Mo.)

Pub Date : 2020-01-01 DOI: 10.3934/fods.2020007

Le Yin, Ioannis Sgouralis, V. Maroulas

Microscopy imaging of plant cells allows the elaborate analysis of sub-cellular motions of organelles. The large video data set can be efficiently analyzed by automated algorithms. We develop a novel, data-oriented algorithm, which can track organelle movements and reconstruct their trajectories on stacks of image data. Our method proceeds with three steps: (ⅰ) identification, (ⅱ) localization, and (ⅲ) linking. This method combines topological data analysis and Ensemble Kalman Filtering, and does not assume a specific motion model. Application of this method on simulated data sets shows an agreement with ground truth. We also successfully test our method on real microscopy data.

植物细胞的显微镜成像可以详细分析细胞器的亚细胞运动。自动化算法可以有效地分析大型视频数据集。我们开发了一种新颖的，面向数据的算法，它可以跟踪细胞器运动并在图像数据堆栈上重建它们的轨迹。我们的方法分为三个步骤:(ⅰ)识别，(ⅱ)定位，(ⅲ)连接。该方法结合了拓扑数据分析和集成卡尔曼滤波，不假设特定的运动模型。在模拟数据集上的应用表明，该方法与地面真实值一致。我们还成功地在真实的显微镜数据上测试了我们的方法。

引用次数: 0

Stochastic gradient descent algorithm for stochastic optimization in solving analytic continuation problems 求解解析延拓问题的随机优化的随机梯度下降算法

Q2 MATHEMATICS, APPLIED

Foundations of data science (Springfield, Mo.)

Pub Date : 2020-01-01 DOI: 10.3934/fods.2020001

F. Bao, T. Maier

We propose a stochastic gradient descent based optimization algorithm to solve the analytic continuation problem in which we extract real frequency spectra from imaginary time Quantum Monte Carlo data. The procedure of analytic continuation is an ill-posed inverse problem which is usually solved by regularized optimization methods, such like the Maximum Entropy method, or stochastic optimization methods. The main contribution of this work is to improve the performance of stochastic optimization approaches by introducing a supervised stochastic gradient descent algorithm to solve a flipped inverse system which processes the random solutions obtained by a type of Fast and Efficient Stochastic Optimization Method.

针对从虚时间量子蒙特卡罗数据中提取实频谱的解析延拓问题，提出了一种基于随机梯度下降的优化算法。解析延拓过程是一个病态逆问题，通常用正则化优化方法求解，如最大熵法或随机优化方法。本文的主要贡献是通过引入有监督的随机梯度下降算法来求解翻转逆系统，从而提高随机优化方法的性能，该算法处理由一种快速有效的随机优化方法得到的随机解。

引用次数: 6

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Foundations of data science (Springfield, Mo.)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀