首页 > 最新文献

J. Mach. Learn. Res.最新文献

英文 中文
Characteristic and Universal Tensor Product Kernels 特征和通用张量积核
Pub Date : 2017-08-28 DOI: 10.13140/RG.2.2.27112.37120
Z. Szabó, Bharath K. Sriperumbudur
Maximum mean discrepancy (MMD), also called energy distance or N-distance in statistics and Hilbert-Schmidt independence criterion (HSIC), specifically distance covariance in statistics, are among the most popular and successful approaches to quantify the difference and independence of random variables, respectively. Thanks to their kernel-based foundations, MMD and HSIC are applicable on a wide variety of domains. Despite their tremendous success, quite little is known about when HSIC characterizes independence and when MMD with tensor product kernel can discriminate probability distributions. In this paper, we answer these questions by studying various notions of characteristic property of the tensor product kernel.
最大平均差异(MMD)在统计学中也称为能量距离或n距离,Hilbert-Schmidt独立准则(HSIC)在统计学中具体称为距离协方差,分别是量化随机变量差异和独立性的最流行和最成功的方法。由于其基于内核的基础,MMD和HSIC适用于各种领域。尽管它们取得了巨大的成功,但对于HSIC何时表征独立性以及具有张量积核的MMD何时能够区分概率分布,人们知之甚少。本文通过研究张量积核的各种特征性质来回答这些问题。
{"title":"Characteristic and Universal Tensor Product Kernels","authors":"Z. Szabó, Bharath K. Sriperumbudur","doi":"10.13140/RG.2.2.27112.37120","DOIUrl":"https://doi.org/10.13140/RG.2.2.27112.37120","url":null,"abstract":"Maximum mean discrepancy (MMD), also called energy distance or N-distance in statistics and Hilbert-Schmidt independence criterion (HSIC), specifically distance covariance in statistics, are among the most popular and successful approaches to quantify the difference and independence of random variables, respectively. Thanks to their kernel-based foundations, MMD and HSIC are applicable on a wide variety of domains. Despite their tremendous success, quite little is known about when HSIC characterizes independence and when MMD with tensor product kernel can discriminate probability distributions. In this paper, we answer these questions by studying various notions of characteristic property of the tensor product kernel.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"32 1","pages":"233:1-233:29"},"PeriodicalIF":0.0,"publicationDate":"2017-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79714976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 60
Matrix Completion and Related Problems via Strong Duality 强对偶矩阵补全及相关问题
Pub Date : 2017-04-27 DOI: 10.4230/LIPIcs.ITCS.2018.5
Maria-Florina Balcan, Yingyu Liang, David P. Woodruff, Hongyang Zhang
This work studies the strong duality of non-convex matrix factorization problems: we show that under certain dual conditions, these problems and its dual have the same optimum. This has been well understood for convex optimization, but little was known for non-convex problems. We propose a novel analytical framework and show that under certain dual conditions, the optimal solution of the matrix factorization program is the same as its bi-dual and thus the global optimality of the non-convex program can be achieved by solving its bi-dual which is convex. These dual conditions are satisfied by a wide class of matrix factorization problems, although matrix factorization problems are hard to solve in full generality. This analytical framework may be of independent interest to non-convex optimization more broadly. We apply our framework to two prototypical matrix factorization problems: matrix completion and robust Principal Component Analysis (PCA). These are examples of efficiently recovering a hidden matrix given limited reliable observations of it. Our framework shows that exact recoverability and strong duality hold with nearly-optimal sample complexity guarantees for matrix completion and robust PCA.
本文研究了非凸矩阵分解问题的强对偶性,证明了在一定的对偶条件下,这些问题与其对偶具有相同的最优解。对于凸优化,这一点已经得到了很好的理解,但对于非凸问题,这一点却知之甚少。提出了一种新的解析框架,证明了在一定对偶条件下,矩阵分解规划的最优解与其双对偶相同,从而通过求解其双对偶实现非凸规划的全局最优性。尽管矩阵分解问题很难得到完全的一般解,但许多矩阵分解问题都满足这些对偶条件。这种分析框架可能对更广泛的非凸优化具有独立的兴趣。我们将我们的框架应用于两个典型的矩阵分解问题:矩阵补全和鲁棒主成分分析(PCA)。这些都是在给定有限的可靠观测值的情况下有效地恢复隐藏矩阵的例子。我们的框架表明,精确的可恢复性和强对偶性对矩阵补全和鲁棒主成分分析具有几乎最优的样本复杂度保证。
{"title":"Matrix Completion and Related Problems via Strong Duality","authors":"Maria-Florina Balcan, Yingyu Liang, David P. Woodruff, Hongyang Zhang","doi":"10.4230/LIPIcs.ITCS.2018.5","DOIUrl":"https://doi.org/10.4230/LIPIcs.ITCS.2018.5","url":null,"abstract":"This work studies the strong duality of non-convex matrix factorization problems: we show that under certain dual conditions, these problems and its dual have the same optimum. This has been well understood for convex optimization, but little was known for non-convex problems. We propose a novel analytical framework and show that under certain dual conditions, the optimal solution of the matrix factorization program is the same as its bi-dual and thus the global optimality of the non-convex program can be achieved by solving its bi-dual which is convex. These dual conditions are satisfied by a wide class of matrix factorization problems, although matrix factorization problems are hard to solve in full generality. This analytical framework may be of independent interest to non-convex optimization more broadly. \u0000 \u0000We apply our framework to two prototypical matrix factorization problems: matrix completion and robust Principal Component Analysis (PCA). These are examples of efficiently recovering a hidden matrix given limited reliable observations of it. Our framework shows that exact recoverability and strong duality hold with nearly-optimal sample complexity guarantees for matrix completion and robust PCA.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"35 1","pages":"102:1-102:56"},"PeriodicalIF":0.0,"publicationDate":"2017-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91541053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Training Gaussian Mixture Models at Scale via Coresets 通过核心集大规模训练高斯混合模型
Pub Date : 2017-03-23 DOI: 10.3929/ETHZ-B-000269194
Mario Lucic, Matthew Faulkner, Andreas Krause, Dan Feldman
How can we train a statistical mixture model on a massive data set? In this work we show how to construct coresets for mixtures of Gaussians. A coreset is a weighted subset of the data, which guarantees that models fitting the coreset also provide a good fit for the original data set. We show that, perhaps surprisingly, Gaussian mixtures admit coresets of size polynomial in dimension and the number of mixture components, while being independent of the data set size. Hence, one can harness computationally intensive algorithms to compute a good approximation on a significantly smaller data set. More importantly, such coresets can be efficiently constructed both in distributed and streaming settings and do not impose restrictions on the data generating process. Our results rely on a novel reduction of statistical estimation to problems in computational geometry and new combinatorial complexity results for mixtures of Gaussians. Empirical evaluation on several real-world data sets suggests that our coreset-based approach enables significant reduction in training-time with negligible approximation error.
我们如何在海量数据集上训练统计混合模型?在这项工作中,我们展示了如何构建高斯混合的核心集。核心集是数据的加权子集,它保证拟合核心集的模型也能很好地拟合原始数据集。我们表明,也许令人惊讶的是,高斯混合物在维度和混合成分数量上承认大小为多项式的核心集,而与数据集大小无关。因此,可以利用计算密集型算法在一个小得多的数据集上计算出一个好的近似值。更重要的是,这些核心集可以在分布式和流设置中有效地构建,并且不会对数据生成过程施加限制。我们的结果依赖于对计算几何问题的统计估计的新颖减少和高斯混合的新组合复杂性结果。对几个真实世界数据集的经验评估表明,我们基于核心集的方法可以显著减少训练时间,近似误差可以忽略不计。
{"title":"Training Gaussian Mixture Models at Scale via Coresets","authors":"Mario Lucic, Matthew Faulkner, Andreas Krause, Dan Feldman","doi":"10.3929/ETHZ-B-000269194","DOIUrl":"https://doi.org/10.3929/ETHZ-B-000269194","url":null,"abstract":"How can we train a statistical mixture model on a massive data set? In this work we show how to construct coresets for mixtures of Gaussians. A coreset is a weighted subset of the data, which guarantees that models fitting the coreset also provide a good fit for the original data set. We show that, perhaps surprisingly, Gaussian mixtures admit coresets of size polynomial in dimension and the number of mixture components, while being independent of the data set size. Hence, one can harness computationally intensive algorithms to compute a good approximation on a significantly smaller data set. More importantly, such coresets can be efficiently constructed both in distributed and streaming settings and do not impose restrictions on the data generating process. Our results rely on a novel reduction of statistical estimation to problems in computational geometry and new combinatorial complexity results for mixtures of Gaussians. Empirical evaluation on several real-world data sets suggests that our coreset-based approach enables significant reduction in training-time with negligible approximation error.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"13 1","pages":"160:1-160:25"},"PeriodicalIF":0.0,"publicationDate":"2017-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89747655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 82
Convergence Rate of a Simulated Annealing Algorithm with Noisy Observations 带噪声观测的模拟退火算法的收敛速度
Pub Date : 2017-03-01 DOI: 10.5555/3322706.3322710
Clément Bouttier, Ioana Gavra
In this paper we propose a modified version of the simulated annealing algorithm for solving a stochastic global optimization problem. More precisely, we address the problem of finding a global minimizer of a function with noisy evaluations. We provide a rate of convergence and its optimized parametrization to ensure a minimal number of evaluations for a given accuracy and a confidence level close to 1. This work is completed with a set of numerical experimentations and assesses the practical performance both on benchmark test cases and on real world examples.
本文提出了一种改进的模拟退火算法来求解随机全局优化问题。更准确地说,我们解决的问题是找到一个具有噪声计算的函数的全局最小值。我们提供了一个收敛率及其优化的参数化,以确保给定精度和接近1的置信水平的最小评估次数。这项工作是通过一组数值实验来完成的,并在基准测试用例和现实世界的例子上评估了实际性能。
{"title":"Convergence Rate of a Simulated Annealing Algorithm with Noisy Observations","authors":"Clément Bouttier, Ioana Gavra","doi":"10.5555/3322706.3322710","DOIUrl":"https://doi.org/10.5555/3322706.3322710","url":null,"abstract":"In this paper we propose a modified version of the simulated annealing algorithm for solving a stochastic global optimization problem. More precisely, we address the problem of finding a global minimizer of a function with noisy evaluations. We provide a rate of convergence and its optimized parametrization to ensure a minimal number of evaluations for a given accuracy and a confidence level close to 1. This work is completed with a set of numerical experimentations and assesses the practical performance both on benchmark test cases and on real world examples.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"38 1","pages":"4:1-4:45"},"PeriodicalIF":0.0,"publicationDate":"2017-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75707618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
On the Behavior of Intrinsically High-Dimensional Spaces: Distances, Direct and Reverse Nearest Neighbors, and Hubness 本质高维空间的行为:距离、正、逆近邻和中心
Pub Date : 2017-01-01 DOI: 10.5555/3122009.3242027
F. Angiulli
Over the years, different characterizations of the curse of dimensionality have been provided, usually stating the conditions under which, in the limit of the infinite dimensionality, distances become indistinguishable. However, these characterizations almost never address the form of associated distributions in the finite, although high-dimensional, case. This work aims to contribute in this respect by investigating the distribution of distances, and of direct and reverse nearest neighbors, in intrinsically high-dimensional spaces. Indeed, we derive a closed form for the distribution of distances from a given point, for the expected distance from a given point to its kth nearest neighbor, and for the expected size of the approximate set of neighbors of a given point in finite high-dimensional spaces. Additionally, the hubness problem is considered, which is related to the form of the function Nk representing the number of points that have a given point as one of their k nearest neighbors, which is also called the number of k-occurrences. Despite the extensive use of this function, the precise characterization of its form is a longstanding problem. We derive a closed form for the number of k-occurrences associated with a given point in finite high-dimensional spaces, together with the associated limiting probability distribution. By investigating the relationships with the hubness phenomenon emerging in network science, we find that the distribution of node (in-)degrees of some real-life, large-scale networks has connections with the distribution of k-occurrences described herein.
多年来,人们对维数诅咒提出了不同的描述,通常是说明在无限维数的极限下,距离变得无法区分的条件。然而,这些描述几乎从来没有处理过有限(尽管是高维)情况下相关分布的形式。这项工作旨在通过研究距离的分布,以及在本质上高维空间中的直接和反向近邻,在这方面做出贡献。实际上,我们导出了在有限高维空间中,给定点到其第k个近邻的距离的期望分布、给定点到其第k个近邻的期望距离以及给定点的近似近邻集的期望大小的封闭形式。此外,还考虑了hub问题,该问题与函数Nk的形式有关,该函数表示将给定点作为其k个最近邻居之一的点的数量,也称为k次出现的次数。尽管这个函数被广泛使用,但其形式的精确表征是一个长期存在的问题。我们导出了有限高维空间中与给定点相关的k次出现数的封闭形式,以及相关的极限概率分布。通过研究与网络科学中出现的hub现象的关系,我们发现一些现实生活中大规模网络的节点(in-)度分布与本文描述的k-occurrence分布有联系。
{"title":"On the Behavior of Intrinsically High-Dimensional Spaces: Distances, Direct and Reverse Nearest Neighbors, and Hubness","authors":"F. Angiulli","doi":"10.5555/3122009.3242027","DOIUrl":"https://doi.org/10.5555/3122009.3242027","url":null,"abstract":"Over the years, different characterizations of the curse of dimensionality have been provided, usually stating the conditions under which, in the limit of the infinite dimensionality, distances become indistinguishable. However, these characterizations almost never address the form of associated distributions in the finite, although high-dimensional, case. This work aims to contribute in this respect by investigating the distribution of distances, and of direct and reverse nearest neighbors, in intrinsically high-dimensional spaces. Indeed, we derive a closed form for the distribution of distances from a given point, for the expected distance from a given point to its kth nearest neighbor, and for the expected size of the approximate set of neighbors of a given point in finite high-dimensional spaces. Additionally, the hubness problem is considered, which is related to the form of the function Nk representing the number of points that have a given point as one of their k nearest neighbors, which is also called the number of k-occurrences. Despite the extensive use of this function, the precise characterization of its form is a longstanding problem. We derive a closed form for the number of k-occurrences associated with a given point in finite high-dimensional spaces, together with the associated limiting probability distribution. By investigating the relationships with the hubness phenomenon emerging in network science, we find that the distribution of node (in-)degrees of some real-life, large-scale networks has connections with the distribution of k-occurrences described herein.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"79 1","pages":"170:1-170:60"},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90421272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Optimal Quantum Sample Complexity of Learning Algorithms 学习算法的最优量子样本复杂度
Pub Date : 2016-07-04 DOI: 10.4230/LIPIcs.CCC.2017.25
Srinivasan Arunachalam, R. D. Wolf
$ newcommand{eps}{varepsilon} $In learning theory, the VC dimension of a concept class $C$ is the most common way to measure its "richness." In the PAC model $$ ThetaBig(frac{d}{eps} + frac{log(1/delta)}{eps}Big) $$ examples are necessary and sufficient for a learner to output, with probability $1-delta$, a hypothesis $h$ that is $eps$-close to the target concept $c$. In the related agnostic model, where the samples need not come from a $cin C$, we know that $$ ThetaBig(frac{d}{eps^2} + frac{log(1/delta)}{eps^2}Big) $$ examples are necessary and sufficient to output an hypothesis $hin C$ whose error is at most $eps$ worse than the best concept in $C$. Here we analyze quantum sample complexity, where each example is a coherent quantum state. This model was introduced by Bshouty and Jackson, who showed that quantum examples are more powerful than classical examples in some fixed-distribution settings. However, Atici and Servedio, improved by Zhang, showed that in the PAC setting, quantum examples cannot be much more powerful: the required number of quantum examples is $$ OmegaBig(frac{d^{1-eta}}{eps} + d + frac{log(1/delta)}{eps}Big)mbox{ for all }eta> 0. $$ Our main result is that quantum and classical sample complexity are in fact equal up to constant factors in both the PAC and agnostic models. We give two approaches. The first is a fairly simple information-theoretic argument that yields the above two classical bounds and yields the same bounds for quantum sample complexity up to a $log(d/eps)$ factor. We then give a second approach that avoids the log-factor loss, based on analyzing the behavior of the "Pretty Good Measurement" on the quantum state identification problems that correspond to learning. This shows classical and quantum sample complexity are equal up to constant factors.
$ newcommand{eps}{varepsilon} $在学习理论中,概念类$C$的VC维是衡量其“丰富度”的最常用方法。在PAC模型中$$ ThetaBig(frac{d}{eps} + frac{log(1/delta)}{eps}Big) $$示例对于学习者输出来说是必要和充分的,其概率为$1-delta$,假设$h$是$eps$ -接近目标概念$c$。在相关的不可知论模型中,样本不需要来自$cin C$,我们知道$$ ThetaBig(frac{d}{eps^2} + frac{log(1/delta)}{eps^2}Big) $$示例是必要的,足以输出一个假设$hin C$,其误差最多$eps$比$C$中的最佳概念差。这里我们分析量子样本复杂度,其中每个例子都是相干量子态。这个模型是由Bshouty和Jackson提出的,他们表明,在一些固定分布的环境中,量子例子比经典例子更强大。然而,由Zhang改进的Atici和Servedio表明,在PAC设置中,量子样例不能更强大:所需的量子样例数量为$$ OmegaBig(frac{d^{1-eta}}{eps} + d + frac{log(1/delta)}{eps}Big)mbox{ for all }eta> 0. $$。我们的主要结果是,在PAC和不可知论模型中,量子和经典样本复杂性实际上等于常数因子。我们给出了两种方法。第一个是一个相当简单的信息论论证,它产生了上述两个经典边界,并产生了量子样本复杂性的相同边界,直至$log(d/eps)$因子。然后,我们给出了第二种避免对数因子损失的方法,基于分析“相当好的测量”在与学习对应的量子态识别问题上的行为。这表明经典和量子样本的复杂性等于常数因子。
{"title":"Optimal Quantum Sample Complexity of Learning Algorithms","authors":"Srinivasan Arunachalam, R. D. Wolf","doi":"10.4230/LIPIcs.CCC.2017.25","DOIUrl":"https://doi.org/10.4230/LIPIcs.CCC.2017.25","url":null,"abstract":"$ newcommand{eps}{varepsilon} $In learning theory, the VC dimension of a concept class $C$ is the most common way to measure its \"richness.\" In the PAC model $$ ThetaBig(frac{d}{eps} + frac{log(1/delta)}{eps}Big) $$ examples are necessary and sufficient for a learner to output, with probability $1-delta$, a hypothesis $h$ that is $eps$-close to the target concept $c$. In the related agnostic model, where the samples need not come from a $cin C$, we know that $$ ThetaBig(frac{d}{eps^2} + frac{log(1/delta)}{eps^2}Big) $$ examples are necessary and sufficient to output an hypothesis $hin C$ whose error is at most $eps$ worse than the best concept in $C$. \u0000Here we analyze quantum sample complexity, where each example is a coherent quantum state. This model was introduced by Bshouty and Jackson, who showed that quantum examples are more powerful than classical examples in some fixed-distribution settings. However, Atici and Servedio, improved by Zhang, showed that in the PAC setting, quantum examples cannot be much more powerful: the required number of quantum examples is $$ OmegaBig(frac{d^{1-eta}}{eps} + d + frac{log(1/delta)}{eps}Big)mbox{ for all }eta> 0. $$ Our main result is that quantum and classical sample complexity are in fact equal up to constant factors in both the PAC and agnostic models. We give two approaches. The first is a fairly simple information-theoretic argument that yields the above two classical bounds and yields the same bounds for quantum sample complexity up to a $log(d/eps)$ factor. We then give a second approach that avoids the log-factor loss, based on analyzing the behavior of the \"Pretty Good Measurement\" on the quantum state identification problems that correspond to learning. This shows classical and quantum sample complexity are equal up to constant factors.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"6 1","pages":"71:1-71:36"},"PeriodicalIF":0.0,"publicationDate":"2016-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78283883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 86
Complete Graphical Characterization and Construction of Adjustment Sets in Markov Equivalence Classes of Ancestral Graphs 祖先图的马尔可夫等价类平差集的完全图形刻画与构造
Pub Date : 2016-06-22 DOI: 10.3929/ETHZ-B-000278021
Emilija Perkovic, J. Textor, M. Kalisch, M. Maathuis
We present a graphical criterion for covariate adjustment that is sound and complete for four different classes of causal graphical models: directed acyclic graphs (DAGs), maximum ancestral graphs (MAGs), completed partially directed acyclic graphs (CPDAGs), and partial ancestral graphs (PAGs). Our criterion unifies covariate adjustment for a large set of graph classes. Moreover, we define an explicit set that satisfies our criterion, if there is any set that satisfies our criterion. We also give efficient algorithms for constructing all (minimal) sets that fulfill our criterion, implemented in the R package dagitty. Finally, we discuss the relationship between our criterion and other criteria for adjustment, and we provide a new, elementary soundness proof for the adjustment criterion for DAGs of Shpitser, VanderWeele and Robins (UAI 2010).
我们提出了一个对四种不同类型的因果图模型:有向无环图(dag)、最大祖先图(MAGs)、完全部分有向无环图(cpdag)和部分祖先图(PAGs)进行协变量调整的图形准则。我们的准则统一了大量图类的协变量调整。此外,如果存在满足我们准则的集合,我们定义一个满足我们准则的显式集合。我们还给出了构造满足我们标准的所有(最小)集的有效算法,并在R包中实现。最后,我们讨论了我们的平差准则与其他平差准则之间的关系,并为Shpitser, VanderWeele和Robins (UAI 2010)的dag平差准则提供了一个新的、初步的合理性证明。
{"title":"Complete Graphical Characterization and Construction of Adjustment Sets in Markov Equivalence Classes of Ancestral Graphs","authors":"Emilija Perkovic, J. Textor, M. Kalisch, M. Maathuis","doi":"10.3929/ETHZ-B-000278021","DOIUrl":"https://doi.org/10.3929/ETHZ-B-000278021","url":null,"abstract":"We present a graphical criterion for covariate adjustment that is sound and complete for four different classes of causal graphical models: directed acyclic graphs (DAGs), maximum ancestral graphs (MAGs), completed partially directed acyclic graphs (CPDAGs), and partial ancestral graphs (PAGs). Our criterion unifies covariate adjustment for a large set of graph classes. Moreover, we define an explicit set that satisfies our criterion, if there is any set that satisfies our criterion. We also give efficient algorithms for constructing all (minimal) sets that fulfill our criterion, implemented in the R package dagitty. Finally, we discuss the relationship between our criterion and other criteria for adjustment, and we provide a new, elementary soundness proof for the adjustment criterion for DAGs of Shpitser, VanderWeele and Robins (UAI 2010).","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"91 1","pages":"220:1-220:62"},"PeriodicalIF":0.0,"publicationDate":"2016-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82371002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 111
A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation 基于功率期望传播的高斯过程伪点逼近的统一框架
Pub Date : 2016-05-23 DOI: 10.17863/CAM.20846
T. Bui, Josiah Yan, Richard E. Turner
Gaussian processes (GPs) are flexible distributions over functions that enable high-level assumptions about unknown functions to be encoded in a parsimonious, flexible and general way. Although elegant, the application of GPs is limited by computational and analytical intractabilities that arise when data are sufficiently numerous or when employing non-Gaussian models. Consequently, a wealth of GP approximation schemes have been developed over the last 15 years to address these key limitations. Many of these schemes employ a small set of pseudo data points to summarise the actual data. In this paper, we develop a new pseudo-point approximation framework using Power Expectation Propagation (Power EP) that unifies a large number of these pseudo-point approximations. Unlike much of the previous venerable work in this area, the new framework is built on standard methods for approximate inference (variational free-energy, EP and Power EP methods) rather than employing approximations to the probabilistic generative model itself. In this way, all of approximation is performed at `inference time' rather than at `modelling time' resolving awkward philosophical and empirical questions that trouble previous approaches. Crucially, we demonstrate that the new framework includes new pseudo-point approximation methods that outperform current approaches on regression and classification tasks.
高斯过程(gp)是函数上的灵活分布,它使对未知函数的高级假设能够以简洁、灵活和通用的方式进行编码。虽然很优雅,但GPs的应用受到计算和分析方面的困难的限制,这些困难会在数据足够多或采用非高斯模型时出现。因此,在过去的15年中,已经开发了大量的GP近似方案来解决这些关键限制。这些方案中的许多都使用一小部分伪数据点来总结实际数据。在本文中,我们利用功率期望传播(Power EP)建立了一个新的伪点逼近框架,它统一了大量的伪点逼近。与该领域之前的许多令人尊敬的工作不同,新框架建立在近似推理的标准方法(变分自由能,EP和Power EP方法)上,而不是使用概率生成模型本身的近似。通过这种方式,所有的近似都是在“推理时间”而不是在“建模时间”进行的,解决了困扰以前方法的尴尬的哲学和经验问题。至关重要的是,我们证明了新的框架包括新的伪点近似方法,在回归和分类任务上优于当前的方法。
{"title":"A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation","authors":"T. Bui, Josiah Yan, Richard E. Turner","doi":"10.17863/CAM.20846","DOIUrl":"https://doi.org/10.17863/CAM.20846","url":null,"abstract":"Gaussian processes (GPs) are flexible distributions over functions that enable high-level assumptions about unknown functions to be encoded in a parsimonious, flexible and general way. Although elegant, the application of GPs is limited by computational and analytical intractabilities that arise when data are sufficiently numerous or when employing non-Gaussian models. Consequently, a wealth of GP approximation schemes have been developed over the last 15 years to address these key limitations. Many of these schemes employ a small set of pseudo data points to summarise the actual data. In this paper, we develop a new pseudo-point approximation framework using Power Expectation Propagation (Power EP) that unifies a large number of these pseudo-point approximations. Unlike much of the previous venerable work in this area, the new framework is built on standard methods for approximate inference (variational free-energy, EP and Power EP methods) rather than employing approximations to the probabilistic generative model itself. In this way, all of approximation is performed at `inference time' rather than at `modelling time' resolving awkward philosophical and empirical questions that trouble previous approaches. Crucially, we demonstrate that the new framework includes new pseudo-point approximation methods that outperform current approaches on regression and classification tasks.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"12 1","pages":"104:1-104:72"},"PeriodicalIF":0.0,"publicationDate":"2016-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90442053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 123
Minimax Estimation of Kernel Mean Embeddings 核均值嵌入的极大极小估计
Pub Date : 2016-02-13 DOI: 10.15496/PUBLIKATION-30501
I. Tolstikhin, Bharath K. Sriperumbudur, Krikamol Muandet
In this paper, we study the minimax estimation of the Bochner integral $$mu_k(P):=int_{mathcal{X}} k(cdot,x),dP(x),$$ also called as the kernel mean embedding, based on random samples drawn i.i.d.~from $P$, where $k:mathcal{X}timesmathcal{X}rightarrowmathbb{R}$ is a positive definite kernel. Various estimators (including the empirical estimator), $hat{theta}_n$ of $mu_k(P)$ are studied in the literature wherein all of them satisfy $bigl| hat{theta}_n-mu_k(P)bigr|_{mathcal{H}_k}=O_P(n^{-1/2})$ with $mathcal{H}_k$ being the reproducing kernel Hilbert space induced by $k$. The main contribution of the paper is in showing that the above mentioned rate of $n^{-1/2}$ is minimax in $|cdot|_{mathcal{H}_k}$ and $|cdot|_{L^2(mathbb{R}^d)}$-norms over the class of discrete measures and the class of measures that has an infinitely differentiable density, with $k$ being a continuous translation-invariant kernel on $mathbb{R}^d$. The interesting aspect of this result is that the minimax rate is independent of the smoothness of the kernel and the density of $P$ (if it exists). This result has practical consequences in statistical applications as the mean embedding has been widely employed in non-parametric hypothesis testing, density estimation, causal inference and feature selection, through its relation to energy distance (and distance covariance).
在本文中,我们研究了Bochner积分$$mu_k(P):=int_{mathcal{X}} k(cdot,x),dP(x),$$的极大极小估计,也称为核均值嵌入,基于从$P$中抽取的随机样本i.i.d,其中$k:mathcal{X}timesmathcal{X}rightarrowmathbb{R}$是一个正定核。文献中研究了$mu_k(P)$的各种估计量(包括经验估计量)$hat{theta}_n$,它们都满足$bigl| hat{theta}_n-mu_k(P)bigr|_{mathcal{H}_k}=O_P(n^{-1/2})$,其中$mathcal{H}_k$是$k$诱导的再现核希尔伯特空间。本文的主要贡献在于证明了$n^{-1/2}$的上述速率在$|cdot|_{mathcal{H}_k}$和$|cdot|_{L^2(mathbb{R}^d)}$ -范数上是极小极大的,在离散测度类和具有无穷可微密度的测度类上,$k$是$mathbb{R}^d$上的连续平移不变核。这个结果的有趣之处在于,极大极小率与核的平滑度和$P$的密度无关(如果存在的话)。这一结果在统计应用中具有实际意义,因为均值嵌入通过其与能量距离(和距离协方差)的关系,已广泛应用于非参数假设检验、密度估计、因果推理和特征选择。
{"title":"Minimax Estimation of Kernel Mean Embeddings","authors":"I. Tolstikhin, Bharath K. Sriperumbudur, Krikamol Muandet","doi":"10.15496/PUBLIKATION-30501","DOIUrl":"https://doi.org/10.15496/PUBLIKATION-30501","url":null,"abstract":"In this paper, we study the minimax estimation of the Bochner integral $$mu_k(P):=int_{mathcal{X}} k(cdot,x),dP(x),$$ also called as the kernel mean embedding, based on random samples drawn i.i.d.~from $P$, where $k:mathcal{X}timesmathcal{X}rightarrowmathbb{R}$ is a positive definite kernel. Various estimators (including the empirical estimator), $hat{theta}_n$ of $mu_k(P)$ are studied in the literature wherein all of them satisfy $bigl| hat{theta}_n-mu_k(P)bigr|_{mathcal{H}_k}=O_P(n^{-1/2})$ with $mathcal{H}_k$ being the reproducing kernel Hilbert space induced by $k$. The main contribution of the paper is in showing that the above mentioned rate of $n^{-1/2}$ is minimax in $|cdot|_{mathcal{H}_k}$ and $|cdot|_{L^2(mathbb{R}^d)}$-norms over the class of discrete measures and the class of measures that has an infinitely differentiable density, with $k$ being a continuous translation-invariant kernel on $mathbb{R}^d$. The interesting aspect of this result is that the minimax rate is independent of the smoothness of the kernel and the density of $P$ (if it exists). This result has practical consequences in statistical applications as the mean embedding has been widely employed in non-parametric hypothesis testing, density estimation, causal inference and feature selection, through its relation to energy distance (and distance covariance).","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"126 1","pages":"86:1-86:47"},"PeriodicalIF":0.0,"publicationDate":"2016-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90873170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 71
Optimal Estimation of Derivatives in Nonparametric Regression 非参数回归导数的最优估计
Pub Date : 2016-01-01 DOI: 10.5555/2946645.3053446
Wenlin Dai, T. Tong, M. Genton
We propose a simple framework for estimating derivatives without _tting the regression function in nonparametric regression. Unlike most existing methods that use the symmetric difference quotients, our method is constructed as a linear combination of observations. It is hence very exible and applicable to both interior and boundary points, including most existing methods as special cases of ours. Within this framework, we define the variance-minimizing estimators for any order derivative of the regression function with a fixed bias-reduction level. For the equidistant design, we derive the asymptotic variance and bias of these estimators. We also show that our new method will, for the first time, achieve the asymptotically optimal convergence rate for difference-based estimators. Finally, we provide an effective criterion for selection of tuning parameters and demonstrate the usefulness of the proposed method through extensive simulation studies of the first-and second-order derivative estimators.
我们提出了一个简单的框架来估计非参数回归中的导数,而不需要对回归函数进行设置。与大多数使用对称差商的现有方法不同,我们的方法被构造为观测值的线性组合。因此,它是非常灵活的,适用于内点和边界点,包括大多数现有的方法作为我们的特殊情况。在此框架内,我们定义了具有固定偏置减少水平的回归函数的任意阶导数的方差最小化估计量。对于等距设计,我们导出了这些估计量的渐近方差和偏置。我们还表明,我们的新方法将首次实现基于差分估计的渐近最优收敛率。最后,我们提供了一个选择调谐参数的有效准则,并通过对一阶和二阶导数估计量的广泛仿真研究证明了所提出方法的有效性。
{"title":"Optimal Estimation of Derivatives in Nonparametric Regression","authors":"Wenlin Dai, T. Tong, M. Genton","doi":"10.5555/2946645.3053446","DOIUrl":"https://doi.org/10.5555/2946645.3053446","url":null,"abstract":"We propose a simple framework for estimating derivatives without _tting the regression function in nonparametric regression. Unlike most existing methods that use the symmetric difference quotients, our method is constructed as a linear combination of observations. It is hence very exible and applicable to both interior and boundary points, including most existing methods as special cases of ours. Within this framework, we define the variance-minimizing estimators for any order derivative of the regression function with a fixed bias-reduction level. For the equidistant design, we derive the asymptotic variance and bias of these estimators. We also show that our new method will, for the first time, achieve the asymptotically optimal convergence rate for difference-based estimators. Finally, we provide an effective criterion for selection of tuning parameters and demonstrate the usefulness of the proposed method through extensive simulation studies of the first-and second-order derivative estimators.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"48 1","pages":"164:1-164:25"},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88020190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
期刊
J. Mach. Learn. Res.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1