首页 > 最新文献

Foundations of data science (Springfield, Mo.)最新文献

英文 中文
Random Walks and Markov Chains 随机漫步和马尔可夫链
Q2 Mathematics Pub Date : 2020-01-01 DOI: 10.1017/9781108755528.004
Avrim Blum, J. Hopcroft, R. Kannan
{"title":"Random Walks and Markov Chains","authors":"Avrim Blum, J. Hopcroft, R. Kannan","doi":"10.1017/9781108755528.004","DOIUrl":"https://doi.org/10.1017/9781108755528.004","url":null,"abstract":"","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/9781108755528.004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"56925823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Stability of non-linear filter for deterministic dynamics 确定性动力学中非线性滤波器的稳定性
Q2 Mathematics Pub Date : 2019-10-31 DOI: 10.3934/fods.2021025
A. Reddy, A. Apte
This papers shows that nonlinear filter in the case of deterministic dynamics is stable with respect to the initial conditions under the conditions that observations are sufficiently rich, both in the context of continuous and discrete time filters. Earlier works on the stability of the nonlinear filters are in the context of stochastic dynamics and assume conditions like compact state space or time independent observation model, whereas we prove filter stability for deterministic dynamics with more general assumptions on the state space and observation process. We give several examples of systems that satisfy these assumptions. We also show that the asymptotic structure of the filtering distribution is related to the dynamical properties of the signal.
本文证明了在连续时间滤波器和离散时间滤波器中,在观测值足够丰富的条件下,确定性动力学下的非线性滤波器相对于初始条件是稳定的。早期关于非线性滤波器稳定性的研究是在随机动力学的背景下进行的,并假设了紧凑的状态空间或时间独立的观察模型等条件,而我们在确定动力学中证明了滤波器的稳定性,并对状态空间和观察过程进行了更一般的假设。我们给出了几个满足这些假设的系统的例子。我们还证明了滤波分布的渐近结构与信号的动态特性有关。
{"title":"Stability of non-linear filter for deterministic dynamics","authors":"A. Reddy, A. Apte","doi":"10.3934/fods.2021025","DOIUrl":"https://doi.org/10.3934/fods.2021025","url":null,"abstract":"This papers shows that nonlinear filter in the case of deterministic dynamics is stable with respect to the initial conditions under the conditions that observations are sufficiently rich, both in the context of continuous and discrete time filters. Earlier works on the stability of the nonlinear filters are in the context of stochastic dynamics and assume conditions like compact state space or time independent observation model, whereas we prove filter stability for deterministic dynamics with more general assumptions on the state space and observation process. We give several examples of systems that satisfy these assumptions. We also show that the asymptotic structure of the filtering distribution is related to the dynamical properties of the signal.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46089690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Bayesian nonparametric test for conditional independence 条件独立性的贝叶斯非参数检验
Q2 Mathematics Pub Date : 2019-10-24 DOI: 10.3934/FODS.2020009
Onur Teymur, S. Filippi
This article introduces a Bayesian nonparametric method for quantifying the relative evidence in a dataset in favour of the dependence or independence of two variables conditional on a third. The approach uses Polya tree priors on spaces of conditional probability densities, accounting for uncertainty in the form of the underlying distributions in a nonparametric way. The Bayesian perspective provides an inherently symmetric probability measure of conditional dependence or independence, a feature particularly advantageous in causal discovery and not employed in existing procedures of this type.
本文介绍了一种贝叶斯非参数方法,用于量化数据集中的相对证据,以支持两个变量对第三个变量的依赖性或独立性。该方法在条件概率密度的空间上使用Polya树先验,以非参数的方式考虑潜在分布形式的不确定性。贝叶斯观点提供了条件依赖性或独立性的固有对称概率度量,这一特征在因果发现中特别有利,而在现有的此类程序中没有采用。
{"title":"A Bayesian nonparametric test for conditional independence","authors":"Onur Teymur, S. Filippi","doi":"10.3934/FODS.2020009","DOIUrl":"https://doi.org/10.3934/FODS.2020009","url":null,"abstract":"This article introduces a Bayesian nonparametric method for quantifying the relative evidence in a dataset in favour of the dependence or independence of two variables conditional on a third. The approach uses Polya tree priors on spaces of conditional probability densities, accounting for uncertainty in the form of the underlying distributions in a nonparametric way. The Bayesian perspective provides an inherently symmetric probability measure of conditional dependence or independence, a feature particularly advantageous in causal discovery and not employed in existing procedures of this type.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43943702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Modelling dynamic network evolution as a Pitman-Yor process 将动态网络演化建模为Pitman-Yor过程
Q2 Mathematics Pub Date : 2019-08-28 DOI: 10.3934/fods.2019013
Francesco Sanna Passino, N. Heard
Dynamic interaction networks frequently arise in biology, communications technology and the social sciences, representing, for example, neuronal connectivity in the brain, internet connections between computers and human interactions within social networks. The evolution and strengthening of the links in such networks can be observed through sequences of connection events occurring between network nodes over time. In some of these applications, the identity and size of the network may be unknown a priori and may change over time. In this article, a model for the evolution of dynamic networks based on the Pitman-Yor process is proposed. This model explicitly admits power-laws in the number of connections on each edge, often present in real world networks, and, for careful choices of the parameters, power-laws for the degree distribution of the nodes. A novel empirical method for the estimation of the hyperparameters of the Pitman-Yor process is proposed, and some necessary corrections for uniform discrete base distributions are carefully addressed. The methodology is tested on synthetic data and in an anomaly detection study on the enterprise computer network of the Los Alamos National Laboratory, and successfully detects connections from a red-team penetration test.
动态交互网络经常出现在生物学、通信技术和社会科学中,例如,代表大脑中的神经元连接、计算机之间的互联网连接和社会网络中的人类交互。这种网络中链接的演变和加强可以通过网络节点之间随时间发生的连接事件序列来观察。在其中一些应用程序中,网络的身份和大小可能是未知的,并且可能随着时间的推移而变化。本文提出了一个基于Pitman-Yor过程的动态网络演化模型。该模型明确承认每条边的连接数存在幂律,这通常出现在现实世界的网络中,并且,对于参数的仔细选择,节点的度分布也存在幂律。提出了一种新的Pitman-Yor过程超参数估计的经验方法,并对均匀离散基分布进行了必要的修正。该方法在综合数据和洛斯阿拉莫斯国家实验室的企业计算机网络异常检测研究中进行了测试,并成功检测到红队渗透测试中的连接。
{"title":"Modelling dynamic network evolution as a Pitman-Yor process","authors":"Francesco Sanna Passino, N. Heard","doi":"10.3934/fods.2019013","DOIUrl":"https://doi.org/10.3934/fods.2019013","url":null,"abstract":"Dynamic interaction networks frequently arise in biology, communications technology and the social sciences, representing, for example, neuronal connectivity in the brain, internet connections between computers and human interactions within social networks. The evolution and strengthening of the links in such networks can be observed through sequences of connection events occurring between network nodes over time. In some of these applications, the identity and size of the network may be unknown a priori and may change over time. In this article, a model for the evolution of dynamic networks based on the Pitman-Yor process is proposed. This model explicitly admits power-laws in the number of connections on each edge, often present in real world networks, and, for careful choices of the parameters, power-laws for the degree distribution of the nodes. A novel empirical method for the estimation of the hyperparameters of the Pitman-Yor process is proposed, and some necessary corrections for uniform discrete base distributions are carefully addressed. The methodology is tested on synthetic data and in an anomaly detection study on the enterprise computer network of the Los Alamos National Laboratory, and successfully detects connections from a red-team penetration test.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48066324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Bayesian inference for latent chain graphs 潜链图的贝叶斯推理
Q2 Mathematics Pub Date : 2019-08-12 DOI: 10.3934/fods.2020003
Deng Lu, M. Iorio, A. Jasra, G. Rosner
In this article we consider Bayesian inference for partially observed Andersson-Madigan-Perlman (AMP) Gaussian chain graph (CG) models. Such models are of particular interest in applications such as biological networks and financial time series. The model itself features a variety of constraints which make both prior modeling and computational inference challenging. We develop a framework for the aforementioned challenges, using a sequential Monte Carlo (SMC) method for statistical inference. Our approach is illustrated on both simulated data as well as real case studies from university graduation rates and a pharmacokinetics study.
在本文中,我们考虑部分观测到的Andersson-Madigan-Perlman (AMP)高斯链图(CG)模型的贝叶斯推理。这种模型在生物网络和金融时间序列等应用中特别有趣。该模型本身具有各种约束,这使得先验建模和计算推理都具有挑战性。我们为上述挑战开发了一个框架,使用时序蒙特卡罗(SMC)方法进行统计推断。我们的方法在模拟数据以及来自大学毕业率和药代动力学研究的真实案例研究中得到了说明。
{"title":"Bayesian inference for latent chain graphs","authors":"Deng Lu, M. Iorio, A. Jasra, G. Rosner","doi":"10.3934/fods.2020003","DOIUrl":"https://doi.org/10.3934/fods.2020003","url":null,"abstract":"In this article we consider Bayesian inference for partially observed Andersson-Madigan-Perlman (AMP) Gaussian chain graph (CG) models. Such models are of particular interest in applications such as biological networks and financial time series. The model itself features a variety of constraints which make both prior modeling and computational inference challenging. We develop a framework for the aforementioned challenges, using a sequential Monte Carlo (SMC) method for statistical inference. Our approach is illustrated on both simulated data as well as real case studies from university graduation rates and a pharmacokinetics study.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46556258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
EmT: Locating empty territories of homology group generators in a dataset EmT:在数据集中定位同源群生成器的空区域
Q2 Mathematics Pub Date : 2019-06-03 DOI: 10.3934/FODS.2019010
Xin Xu, J. Cisewski-Kehe
Persistent homology is a tool within topological data analysis to detect different dimensional holes in a dataset. The boundaries of the empty territories (i.e., holes) are not well-defined and each has multiple representations. The proposed method, Empty Territory (EmT), provides representations of different dimensional holes with a specified level of complexity of the territory boundary. EmT is designed for the setting where persistent homology uses a Vietoris-Rips complex filtration, and works as a post-analysis to refine the hole representation of the persistent homology algorithm. In particular, EmT uses alpha shapes to obtain a special class of representations that captures the empty territories with a complexity determined by the size of the alpha balls. With a fixed complexity, EmT returns the representation that contains the most points within the special class of representations. This method is limited to finding 1D holes in 2D data and 2D holes in 3D data, and is illustrated on simulation datasets of a homogeneous Poisson point process in 2D and a uniform sampling in 3D. Furthermore, the method is applied to a 2D cell tower location geography dataset and 3D Sloan Digital Sky Survey (SDSS) galaxy dataset, where it works well in capturing the empty territories.
持久同源性是拓扑数据分析中的一种工具,用于检测数据集中不同维度的漏洞。空白区域(即孔洞)的边界没有明确定义,每个区域都有多个表示。所提出的方法,空区域(EmT),提供了具有特定复杂程度的区域边界的不同尺寸孔的表示。EmT是为持久同源性使用Vietoris Rips复杂过滤的环境而设计的,并作为后分析来完善持久同源性算法的空穴表示。特别地,EmT使用阿尔法形状来获得一类特殊的表示,该表示捕捉由阿尔法球的大小决定的复杂度的空白区域。在固定复杂度的情况下,EmT返回在特殊表示类中包含最多点的表示。该方法仅限于在2D数据中找到1D空穴和在3D数据中找到2D空穴,并在2D中的齐次泊松点过程和3D中的均匀采样的模拟数据集上进行了说明。此外,该方法还应用于2D细胞塔位置地理数据集和3D斯隆数字巡天(SDSS)星系数据集,在那里它可以很好地捕捉空白区域。
{"title":"EmT: Locating empty territories of homology group generators in a dataset","authors":"Xin Xu, J. Cisewski-Kehe","doi":"10.3934/FODS.2019010","DOIUrl":"https://doi.org/10.3934/FODS.2019010","url":null,"abstract":"Persistent homology is a tool within topological data analysis to detect different dimensional holes in a dataset. The boundaries of the empty territories (i.e., holes) are not well-defined and each has multiple representations. The proposed method, Empty Territory (EmT), provides representations of different dimensional holes with a specified level of complexity of the territory boundary. EmT is designed for the setting where persistent homology uses a Vietoris-Rips complex filtration, and works as a post-analysis to refine the hole representation of the persistent homology algorithm. In particular, EmT uses alpha shapes to obtain a special class of representations that captures the empty territories with a complexity determined by the size of the alpha balls. With a fixed complexity, EmT returns the representation that contains the most points within the special class of representations. This method is limited to finding 1D holes in 2D data and 2D holes in 3D data, and is illustrated on simulation datasets of a homogeneous Poisson point process in 2D and a uniform sampling in 3D. Furthermore, the method is applied to a 2D cell tower location geography dataset and 3D Sloan Digital Sky Survey (SDSS) galaxy dataset, where it works well in capturing the empty territories.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42374169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Levels and trends in the sex ratio at birth and missing female births for 29 states and union territories in India 1990–2016: A Bayesian modeling study 1990-2016年印度29个邦和联邦属地出生性别比和失踪女婴的水平和趋势:贝叶斯模型研究
Q2 Mathematics Pub Date : 2019-06-03 DOI: 10.3934/FODS.2019008
Fengqing Chao, A. Yadav
The sex ratio at birth (SRB) has risen in India and reaches well beyond the levels under normal circumstances since the 1970s. The lasting imbalanced SRB has resulted in much more males than females in India. A population with severely distorted sex ratio is more likely to have prolonged struggle for stability and sustainability. It is crucial to estimate SRB and its imbalance for India on state level and assess the uncertainty around estimates. We develop a Bayesian model to estimate SRB in India from 1990 to 2016 for 29 states and union territories. Our analyses are based on a comprehensive database on state-level SRB with data from the sample registration system, census and Demographic and Health Surveys. The SRB varies greatly across Indian states and union territories in 2016: ranging from 1.026 (95% uncertainty interval [0.971; 1.087]) in Mizoram to 1.181 [1.143; 1.128] in Haryana. We identify 18 states and union territories with imbalanced SRB during 1990–2016, resulting in 14.9 [13.2; 16.5] million of missing female births in India. Uttar Pradesh has the largest share of the missing female births among all states and union territories, taking up to 32.8% [29.5%; 36.3%] of the total number.
自20世纪70年代以来,印度的出生性比例(SRB)一直在上升,远远超过了正常情况下的水平。长期的男女性别比失衡导致印度男性比女性多得多。性别比例严重扭曲的人口更有可能为稳定和可持续发展而进行长期斗争。至关重要的是要估计印度邦一级的SRB及其不平衡,并评估估计的不确定性。我们开发了一个贝叶斯模型来估计1990年至2016年印度29个邦和联邦领土的SRB。我们的分析是基于一个国家级SRB的综合数据库,其中的数据来自抽样登记系统、人口普查和人口与健康调查。2016年印度各邦和联邦属地的SRB差异很大:从1.026(95%不确定区间[0.971;1.087]),米佐拉姆邦为1.181 [1.143;[1.28]哈里亚纳邦。在1990-2016年期间,我们确定了18个州和联邦领土的性别性别失衡,导致14.9 [13.2;1650万印度失踪的女婴。北方邦在所有邦和联邦领土中失踪女婴的比例最大,占32.8% [29.5%;占总数的36.3%。
{"title":"Levels and trends in the sex ratio at birth and missing female births for 29 states and union territories in India 1990–2016: A Bayesian modeling study","authors":"Fengqing Chao, A. Yadav","doi":"10.3934/FODS.2019008","DOIUrl":"https://doi.org/10.3934/FODS.2019008","url":null,"abstract":"The sex ratio at birth (SRB) has risen in India and reaches well beyond the levels under normal circumstances since the 1970s. The lasting imbalanced SRB has resulted in much more males than females in India. A population with severely distorted sex ratio is more likely to have prolonged struggle for stability and sustainability. It is crucial to estimate SRB and its imbalance for India on state level and assess the uncertainty around estimates. We develop a Bayesian model to estimate SRB in India from 1990 to 2016 for 29 states and union territories. Our analyses are based on a comprehensive database on state-level SRB with data from the sample registration system, census and Demographic and Health Surveys. The SRB varies greatly across Indian states and union territories in 2016: ranging from 1.026 (95% uncertainty interval [0.971; 1.087]) in Mizoram to 1.181 [1.143; 1.128] in Haryana. We identify 18 states and union territories with imbalanced SRB during 1990–2016, resulting in 14.9 [13.2; 16.5] million of missing female births in India. Uttar Pradesh has the largest share of the missing female births among all states and union territories, taking up to 32.8% [29.5%; 36.3%] of the total number.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47172022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Power weighted shortest paths for clustering Euclidean data 欧氏数据聚类的权加权最短路径
Q2 Mathematics Pub Date : 2019-05-30 DOI: 10.3934/fods.2019014
Daniel Mckenzie, S. Damelin
We study the use of power weighted shortest path distance functions for clustering high dimensional Euclidean data, under the assumption that the data is drawn from a collection of disjoint low dimensional manifolds. We argue, theoretically and experimentally, that this leads to higher clustering accuracy. We also present a fast algorithm for computing these distances.
在假设高维欧几里德数据来自不相交的低维流形集合的情况下,研究了幂加权最短路径距离函数在高维欧几里德数据聚类中的应用。我们认为,从理论上和实验上,这将导致更高的聚类精度。我们还提出了一种计算这些距离的快速算法。
{"title":"Power weighted shortest paths for clustering Euclidean data","authors":"Daniel Mckenzie, S. Damelin","doi":"10.3934/fods.2019014","DOIUrl":"https://doi.org/10.3934/fods.2019014","url":null,"abstract":"We study the use of power weighted shortest path distance functions for clustering high dimensional Euclidean data, under the assumption that the data is drawn from a collection of disjoint low dimensional manifolds. We argue, theoretically and experimentally, that this leads to higher clustering accuracy. We also present a fast algorithm for computing these distances.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70247788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
General risk measures for robust machine learning 鲁棒机器学习的一般风险度量
Q2 Mathematics Pub Date : 2019-04-26 DOI: 10.3934/fods.2019011
É. Chouzenoux, Henri G'erard, J. Pesquet
A wide array of machine learning problems are formulated as the minimization of the expectation of a convex loss function on some parameter space. Since the probability distribution of the data of interest is usually unknown, it is is often estimated from training sets, which may lead to poor out-of-sample performance. In this work, we bring new insights in this problem by using the framework which has been developed in quantitative finance for risk measures. We show that the original min-max problem can be recast as a convex minimization problem under suitable assumptions. We discuss several important examples of robust formulations, in particular by defining ambiguity sets based on $varphi$-divergences and the Wasserstein metric.We also propose an efficient algorithm for solving the corresponding convex optimization problems involving complex convex constraints. Through simulation examples, we demonstrate that this algorithm scales well on real data sets.
一系列广泛的机器学习问题被公式化为在一些参数空间上凸损失函数的期望的最小化。由于感兴趣数据的概率分布通常是未知的,因此通常是根据训练集来估计的,这可能会导致样本外性能较差。在这项工作中,我们通过使用量化金融中开发的风险度量框架,为这个问题带来了新的见解。我们证明了在适当的假设下,原始的最小-最大问题可以被重新定义为凸最小化问题。我们讨论了鲁棒公式的几个重要例子,特别是通过定义基于$varphi$-differences和Wasserstein度量的模糊集。我们还提出了一种有效的算法来解决涉及复杂凸约束的相应凸优化问题。通过仿真实例,我们证明了该算法在真实数据集上的良好扩展性。
{"title":"General risk measures for robust machine learning","authors":"É. Chouzenoux, Henri G'erard, J. Pesquet","doi":"10.3934/fods.2019011","DOIUrl":"https://doi.org/10.3934/fods.2019011","url":null,"abstract":"A wide array of machine learning problems are formulated as the minimization of the expectation of a convex loss function on some parameter space. Since the probability distribution of the data of interest is usually unknown, it is is often estimated from training sets, which may lead to poor out-of-sample performance. In this work, we bring new insights in this problem by using the framework which has been developed in quantitative finance for risk measures. We show that the original min-max problem can be recast as a convex minimization problem under suitable assumptions. We discuss several important examples of robust formulations, in particular by defining ambiguity sets based on $varphi$-divergences and the Wasserstein metric.We also propose an efficient algorithm for solving the corresponding convex optimization problems involving complex convex constraints. Through simulation examples, we demonstrate that this algorithm scales well on real data sets.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43459497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Estimation and uncertainty quantification for the output from quantum simulators 量子模拟器输出的估计与不确定性量化
Q2 Mathematics Pub Date : 2019-03-07 DOI: 10.3934/FODS.2019007
R. Bennink, A. Jasra, K. Law, P. Lougovski
The problem of estimating certain distributions over {0, 1}d is considered here. The distribution represents a quantum system of d qubits, where there are non-trivial dependencies between the qubits. A maximum entropy approach is adopted to reconstruct the distribution from exact moments or observed empirical moments. The Robbins Monro algorithm is used to solve the intractable maximum entropy problem, by constructing an unbiased estimator of the un-normalized target with a sequential Monte Carlo sampler at each iteration. In the case of empirical moments, this coincides with a maximum likelihood estimator. A Bayesian formulation is also considered in order to quantify uncertainty a posteriori. Several approaches are proposed in order to tackle this challenging problem, based on recently developed methodologies. In particular, unbiased estimators of the gradient of the log posterior are constructed and used within a provably convergent Langevin-based Markov chain Monte Carlo method. The methods are illustrated on classically simulated output from quantum simulators.
这里考虑了在{0,1}d上估计某些分布的问题。该分布表示d个量子位的量子系统,其中量子位之间存在非平凡的依赖关系。采用最大熵方法从精确矩或观测到的经验矩重建分布。Robbins-Monro算法用于解决棘手的最大熵问题,方法是在每次迭代时用顺序蒙特卡罗采样器构造未归一化目标的无偏估计器。在经验矩的情况下,这与最大似然估计器一致。为了对不确定性进行后验量化,还考虑了贝叶斯公式。根据最近开发的方法,提出了几种方法来解决这一具有挑战性的问题。特别地,在可证明收敛的基于Langevin的马尔可夫链蒙特卡罗方法中,构造并使用对数后验梯度的无偏估计量。这些方法在量子模拟器的经典模拟输出上进行了说明。
{"title":"Estimation and uncertainty quantification for the output from quantum simulators","authors":"R. Bennink, A. Jasra, K. Law, P. Lougovski","doi":"10.3934/FODS.2019007","DOIUrl":"https://doi.org/10.3934/FODS.2019007","url":null,"abstract":"The problem of estimating certain distributions over {0, 1}d is considered here. The distribution represents a quantum system of d qubits, where there are non-trivial dependencies between the qubits. A maximum entropy approach is adopted to reconstruct the distribution from exact moments or observed empirical moments. The Robbins Monro algorithm is used to solve the intractable maximum entropy problem, by constructing an unbiased estimator of the un-normalized target with a sequential Monte Carlo sampler at each iteration. In the case of empirical moments, this coincides with a maximum likelihood estimator. A Bayesian formulation is also considered in order to quantify uncertainty a posteriori. Several approaches are proposed in order to tackle this challenging problem, based on recently developed methodologies. In particular, unbiased estimators of the gradient of the log posterior are constructed and used within a provably convergent Langevin-based Markov chain Monte Carlo method. The methods are illustrated on classically simulated output from quantum simulators.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42733584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Foundations of data science (Springfield, Mo.)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1