首页 > 最新文献

SIAM journal on mathematics of data science最新文献

英文 中文
Generalization error of minimum weighted norm and kernel interpolation 最小加权范数与核插值的泛化误差
Q1 MATHEMATICS, APPLIED Pub Date : 2020-08-07 DOI: 10.1137/20M1359912
Weilin Li
We study the generalization error of functions that interpolate prescribed data points and are selected by minimizing a weighted norm. Under natural and general conditions, we prove that both the interpolants and their generalization errors converge as the number of parameters grow, and the limiting interpolant belongs to a reproducing kernel Hilbert space. This rigorously establishes an implicit bias of minimum weighted norm interpolation and explains why norm minimization may benefit from over-parameterization. As special cases of this theory, we study interpolation by trigonometric polynomials and spherical harmonics. Our approach is from a deterministic and approximation theory viewpoint, as opposed a statistical or random matrix one.
我们研究了插值指定数据点的函数的泛化误差,并通过最小化加权范数来选择。在自然条件和一般条件下,我们证明了插值量及其泛化误差随着参数数目的增加而收敛,并且证明了极限插值量属于可复制核Hilbert空间。这严格地建立了最小加权范数插值的隐式偏差,并解释了为什么范数最小化可能受益于过度参数化。作为该理论的特例,我们研究了三角多项式和球谐插值。我们的方法是从确定性和近似理论的观点出发,而不是从统计或随机矩阵的观点出发。
{"title":"Generalization error of minimum weighted norm and kernel interpolation","authors":"Weilin Li","doi":"10.1137/20M1359912","DOIUrl":"https://doi.org/10.1137/20M1359912","url":null,"abstract":"We study the generalization error of functions that interpolate prescribed data points and are selected by minimizing a weighted norm. Under natural and general conditions, we prove that both the interpolants and their generalization errors converge as the number of parameters grow, and the limiting interpolant belongs to a reproducing kernel Hilbert space. This rigorously establishes an implicit bias of minimum weighted norm interpolation and explains why norm minimization may benefit from over-parameterization. As special cases of this theory, we study interpolation by trigonometric polynomials and spherical harmonics. Our approach is from a deterministic and approximation theory viewpoint, as opposed a statistical or random matrix one.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"199 1","pages":"414-438"},"PeriodicalIF":0.0,"publicationDate":"2020-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78571762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Normal-bundle Bootstrap 法丛引导
Q1 MATHEMATICS, APPLIED Pub Date : 2020-07-27 DOI: 10.1137/20M1356002
Ruda Zhang, R. Ghanem
Probabilistic models of data sets often exhibit salient geometric structure. Such a phenomenon is summed up in the manifold distribution hypothesis, and can be exploited in probabilistic learning. Here we present normal-bundle bootstrap (NBB), a method that generates new data which preserve the geometric structure of a given data set. Inspired by algorithms for manifold learning and concepts in differential geometry, our method decomposes the underlying probability measure into a marginalized measure on a learned data manifold and conditional measures on the normal spaces. The algorithm estimates the data manifold as a density ridge, and constructs new data by bootstrapping projection vectors and adding them to the ridge. We apply our method to the inference of density ridge and related statistics, and data augmentation to reduce overfitting.
数据集的概率模型通常表现出显著的几何结构。这种现象可以归纳为流形分布假设,并可以在概率学习中加以利用。在这里,我们提出了正态束bootstrap (NBB),一种生成保留给定数据集几何结构的新数据的方法。受流形学习算法和微分几何概念的启发,我们的方法将潜在的概率测度分解为在学习的数据流形上的边缘测度和在法向空间上的条件测度。该算法将数据流形估计为密度脊,通过自提投影向量并将其加到密度脊上构造新数据。我们将我们的方法应用于密度脊的推断和相关统计,以及数据增强以减少过拟合。
{"title":"Normal-bundle Bootstrap","authors":"Ruda Zhang, R. Ghanem","doi":"10.1137/20M1356002","DOIUrl":"https://doi.org/10.1137/20M1356002","url":null,"abstract":"Probabilistic models of data sets often exhibit salient geometric structure. Such a phenomenon is summed up in the manifold distribution hypothesis, and can be exploited in probabilistic learning. Here we present normal-bundle bootstrap (NBB), a method that generates new data which preserve the geometric structure of a given data set. Inspired by algorithms for manifold learning and concepts in differential geometry, our method decomposes the underlying probability measure into a marginalized measure on a learned data manifold and conditional measures on the normal spaces. The algorithm estimates the data manifold as a density ridge, and constructs new data by bootstrapping projection vectors and adding them to the ridge. We apply our method to the inference of density ridge and related statistics, and data augmentation to reduce overfitting.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"126 1","pages":"573-592"},"PeriodicalIF":0.0,"publicationDate":"2020-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80009795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection 像(Var)Pro一样训练:具有可变投影的神经网络的有效训练
Q1 MATHEMATICS, APPLIED Pub Date : 2020-07-26 DOI: 10.1137/20m1359511
Elizabeth Newman, Lars Ruthotto, Joseph L. Hart, B. V. B. Waanders
Deep neural networks (DNNs) have achieved state-of-the-art performance across a variety of traditional machine learning tasks, e.g., speech recognition, image classification, and segmentation. The ability of DNNs to efficiently approximate high-dimensional functions has also motivated their use in scientific applications, e.g., to solve partial differential equations (PDE) and to generate surrogate models. In this paper, we consider the supervised training of DNNs, which arises in many of the above applications. We focus on the central problem of optimizing the weights of the given DNN such that it accurately approximates the relation between observed input and target data. Devising effective solvers for this optimization problem is notoriously challenging due to the large number of weights, non-convexity, data-sparsity, and non-trivial choice of hyperparameters. To solve the optimization problem more efficiently, we propose the use of variable projection (VarPro), a method originally designed for separable nonlinear least-squares problems. Our main contribution is the Gauss-Newton VarPro method (GNvpro) that extends the reach of the VarPro idea to non-quadratic objective functions, most notably, cross-entropy loss functions arising in classification. These extensions make GNvpro applicable to all training problems that involve a DNN whose last layer is an affine mapping, which is common in many state-of-the-art architectures. In numerical experiments from classification and surrogate modeling, GNvpro not only solves the optimization problem more efficiently but also yields DNNs that generalize better than commonly-used optimization schemes.
深度神经网络(dnn)已经在各种传统机器学习任务中取得了最先进的性能,例如语音识别、图像分类和分割。深度神经网络有效地近似高维函数的能力也推动了它们在科学应用中的应用,例如,求解偏微分方程(PDE)和生成代理模型。在本文中,我们考虑在上述许多应用中出现的dnn的监督训练。我们关注的中心问题是优化给定深度神经网络的权重,使其准确地近似于观察到的输入和目标数据之间的关系。由于大量的权重、非凸性、数据稀疏性和超参数的非平凡选择,为这个优化问题设计有效的解决方案非常具有挑战性。为了更有效地解决优化问题,我们提出使用变量投影(VarPro),这是一种最初设计用于可分离非线性最小二乘问题的方法。我们的主要贡献是高斯-牛顿VarPro方法(GNvpro),它将VarPro思想的范围扩展到非二次目标函数,最值得注意的是分类中出现的交叉熵损失函数。这些扩展使GNvpro适用于所有涉及DNN的训练问题,DNN的最后一层是仿射映射,这在许多最先进的体系结构中很常见。在分类和代理建模的数值实验中,GNvpro不仅更有效地解决了优化问题,而且生成的dnn比常用的优化方案具有更好的泛化能力。
{"title":"Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection","authors":"Elizabeth Newman, Lars Ruthotto, Joseph L. Hart, B. V. B. Waanders","doi":"10.1137/20m1359511","DOIUrl":"https://doi.org/10.1137/20m1359511","url":null,"abstract":"Deep neural networks (DNNs) have achieved state-of-the-art performance across a variety of traditional machine learning tasks, e.g., speech recognition, image classification, and segmentation. The ability of DNNs to efficiently approximate high-dimensional functions has also motivated their use in scientific applications, e.g., to solve partial differential equations (PDE) and to generate surrogate models. In this paper, we consider the supervised training of DNNs, which arises in many of the above applications. We focus on the central problem of optimizing the weights of the given DNN such that it accurately approximates the relation between observed input and target data. Devising effective solvers for this optimization problem is notoriously challenging due to the large number of weights, non-convexity, data-sparsity, and non-trivial choice of hyperparameters. To solve the optimization problem more efficiently, we propose the use of variable projection (VarPro), a method originally designed for separable nonlinear least-squares problems. Our main contribution is the Gauss-Newton VarPro method (GNvpro) that extends the reach of the VarPro idea to non-quadratic objective functions, most notably, cross-entropy loss functions arising in classification. These extensions make GNvpro applicable to all training problems that involve a DNN whose last layer is an affine mapping, which is common in many state-of-the-art architectures. In numerical experiments from classification and surrogate modeling, GNvpro not only solves the optimization problem more efficiently but also yields DNNs that generalize better than commonly-used optimization schemes.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"10 2 1","pages":"1041-1066"},"PeriodicalIF":0.0,"publicationDate":"2020-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81647654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
EnResNet: ResNets Ensemble via the Feynman-Kac Formalism for Adversarial Defense and Beyond EnResNet:基于费曼-卡茨形式主义的对抗防御及其后续的ResNets集成
Q1 MATHEMATICS, APPLIED Pub Date : 2020-07-13 DOI: 10.1137/19m1265302
Bao Wang, Binjie Yuan, Zuoqiang Shi, S. Osher
Empirical adversarial risk minimization is a widely used mathematical framework to robustly train deep neural nets that are resistant to adversarial attacks. However, both natural and robust accura...
经验对抗性风险最小化是一种广泛使用的数学框架,用于鲁棒训练抵抗对抗性攻击的深度神经网络。然而,无论是自然的还是稳健的……
{"title":"EnResNet: ResNets Ensemble via the Feynman-Kac Formalism for Adversarial Defense and Beyond","authors":"Bao Wang, Binjie Yuan, Zuoqiang Shi, S. Osher","doi":"10.1137/19m1265302","DOIUrl":"https://doi.org/10.1137/19m1265302","url":null,"abstract":"Empirical adversarial risk minimization is a widely used mathematical framework to robustly train deep neural nets that are resistant to adversarial attacks. However, both natural and robust accura...","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"96 1","pages":"559-582"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77529902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Performance Guarantee for Spectral Clustering 谱聚类的性能保证
Q1 MATHEMATICS, APPLIED Pub Date : 2020-07-10 DOI: 10.1137/20M1352193
M. Boedihardjo, Shaofeng Deng, T. Strohmer
The two-step spectral clustering method, which consists of the Laplacian eigenmap and a rounding step, is a widely used method for graph partitioning. It can be seen as a natural relaxation to the NP-hard minimum ratio cut problem. In this paper we study the central question: when is spectral clustering able to find the global solution to the minimum ratio cut problem? First we provide a condition that naturally depends on the intra- and inter-cluster connectivities of a given partition under which we may certify that this partition is the solution to the minimum ratio cut problem. Then we develop a deterministic two-to-infinity norm perturbation bound for the the invariant subspace of the graph Laplacian that corresponds to the $k$ smallest eigenvalues. Finally by combining these two results we give a condition under which spectral clustering is guaranteed to output the global solution to the minimum ratio cut problem, which serves as a performance guarantee for spectral clustering.
两步谱聚类方法是一种应用广泛的图划分方法,它由拉普拉斯特征映射和舍入步骤组成。这可以看作是NP-hard最小比值切割问题的自然松弛。本文研究的核心问题是:谱聚类何时能够找到最小比值分割问题的全局解?首先,我们提供了一个自然依赖于给定分区的簇内和簇间连通性的条件,在这个条件下,我们可以证明这个分区是最小比值切割问题的解。然后,我们为图拉普拉斯的不变子空间建立了一个确定性的2到无穷范数摄动界,它对应于k个最小特征值。最后结合这两个结果给出了谱聚类保证输出最小比值切割问题全局解的条件,为谱聚类的性能提供了保证。
{"title":"A Performance Guarantee for Spectral Clustering","authors":"M. Boedihardjo, Shaofeng Deng, T. Strohmer","doi":"10.1137/20M1352193","DOIUrl":"https://doi.org/10.1137/20M1352193","url":null,"abstract":"The two-step spectral clustering method, which consists of the Laplacian eigenmap and a rounding step, is a widely used method for graph partitioning. It can be seen as a natural relaxation to the NP-hard minimum ratio cut problem. In this paper we study the central question: when is spectral clustering able to find the global solution to the minimum ratio cut problem? First we provide a condition that naturally depends on the intra- and inter-cluster connectivities of a given partition under which we may certify that this partition is the solution to the minimum ratio cut problem. Then we develop a deterministic two-to-infinity norm perturbation bound for the the invariant subspace of the graph Laplacian that corresponds to the $k$ smallest eigenvalues. Finally by combining these two results we give a condition under which spectral clustering is guaranteed to output the global solution to the minimum ratio cut problem, which serves as a performance guarantee for spectral clustering.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"2013 1","pages":"369-387"},"PeriodicalIF":0.0,"publicationDate":"2020-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87731639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Semi-supervised Learning for Aggregated Multilayer Graphs Using Diffuse Interface Methods and Fast Matrix-Vector Products 基于扩散接口方法和快速矩阵向量积的聚合多层图半监督学习
Q1 MATHEMATICS, APPLIED Pub Date : 2020-07-10 DOI: 10.1137/20M1352028
Kai Bergermann, M. Stoll, Toni Volkmer
We generalize a graph-based multiclass semi-supervised classification technique based on diffuse interface methods to multilayer graphs. Besides the treatment of various applications with an inherent multilayer structure, we present a very flexible approach that interprets high-dimensional data in a low-dimensional multilayer graph representation. Highly efficient numerical methods involving the spectral decomposition of the corresponding differential graph operators as well as fast matrix-vector products based on the nonequispaced fast Fourier transform (NFFT) enable the rapid treatment of large and high-dimensional data sets. We perform various numerical tests putting a special focus on image segmentation. In particular, we test the performance of our method on data sets with up to 10 million nodes per layer as well as up to 104 dimensions resulting in graphs with up to 52 layers. While all presented numerical experiments can be run on an average laptop computer, the linear dependence per iteration step of the runtime on the network size in all stages of our algorithm makes it scalable to even larger and higher-dimensional problems.
将一种基于扩散接口方法的基于图的多类半监督分类技术推广到多层图。除了处理具有固有多层结构的各种应用程序外,我们还提出了一种非常灵活的方法,可以用低维多层图表示来解释高维数据。高效的数值方法涉及相应的微分图算子的谱分解以及基于非均衡快速傅里叶变换(NFFT)的快速矩阵向量乘积,使得快速处理大型和高维数据集成为可能。我们进行了各种数值测试,特别关注图像分割。特别是,我们在每层多达1000万个节点的数据集以及多达104个维度的数据集上测试了我们的方法的性能,从而产生了多达52层的图。虽然所有的数值实验都可以在一台普通的笔记本电脑上运行,但在我们算法的所有阶段,运行时的每个迭代步骤对网络大小的线性依赖使得它可以扩展到更大、更高维度的问题。
{"title":"Semi-supervised Learning for Aggregated Multilayer Graphs Using Diffuse Interface Methods and Fast Matrix-Vector Products","authors":"Kai Bergermann, M. Stoll, Toni Volkmer","doi":"10.1137/20M1352028","DOIUrl":"https://doi.org/10.1137/20M1352028","url":null,"abstract":"We generalize a graph-based multiclass semi-supervised classification technique based on diffuse interface methods to multilayer graphs. Besides the treatment of various applications with an inherent multilayer structure, we present a very flexible approach that interprets high-dimensional data in a low-dimensional multilayer graph representation. Highly efficient numerical methods involving the spectral decomposition of the corresponding differential graph operators as well as fast matrix-vector products based on the nonequispaced fast Fourier transform (NFFT) enable the rapid treatment of large and high-dimensional data sets. We perform various numerical tests putting a special focus on image segmentation. In particular, we test the performance of our method on data sets with up to 10 million nodes per layer as well as up to 104 dimensions resulting in graphs with up to 52 layers. While all presented numerical experiments can be run on an average laptop computer, the linear dependence per iteration step of the runtime on the network size in all stages of our algorithm makes it scalable to even larger and higher-dimensional problems.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"80 1","pages":"758-785"},"PeriodicalIF":0.0,"publicationDate":"2020-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73724464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Variational Representations and Neural Network Estimation of Rényi Divergences rassanyi散度的变分表示与神经网络估计
Q1 MATHEMATICS, APPLIED Pub Date : 2020-07-07 DOI: 10.1137/20m1368926
Jeremiah Birrell, P. Dupuis, M. Katsoulakis, L. Rey-Bellet, Jie Wang
We derive a new variational formula for the R{e}nyi family of divergences, $R_alpha(Q|P)$, between probability measures $Q$ and $P$. Our result generalizes the classical Donsker-Varadhan variational formula for the Kullback-Leibler divergence. We further show that this R{e}nyi variational formula holds over a range of function spaces; this leads to a formula for the optimizer under very weak assumptions and is also key in our development of a consistency theory for R{e}nyi divergence estimators. By applying this theory to neural network estimators, we show that if a neural network family satisfies one of several strengthened versions of the universal approximation property then the corresponding R{e}nyi divergence estimator is consistent. In contrast to likelihood-ratio based methods, our estimators involve only expectations under $Q$ and $P$ and hence are more effective in high dimensional systems. We illustrate this via several numerical examples of neural network estimation in systems of up to 5000 dimensions.
我们为概率测度$Q$和$P$之间的R{e}nyi散度族$R_ α (Q|P)$导出了一个新的变分公式。我们的结果推广了经典的关于Kullback-Leibler散度的Donsker-Varadhan变分公式。我们进一步证明了这个R{e}nyi变分公式在一系列函数空间上成立;这导致了在非常弱的假设下优化器的公式,也是我们发展R{e}nyi散度估计的一致性理论的关键。通过将这一理论应用于神经网络估计量,我们证明了如果一个神经网络族满足全称近似性质的几个强化版本之一,则相应的R{e}nyi散度估计量是一致的。与基于似然比的方法相比,我们的估计器只涉及$Q$和$P$下的期望,因此在高维系统中更有效。我们通过在多达5000维的系统中进行神经网络估计的几个数值例子来说明这一点。
{"title":"Variational Representations and Neural Network Estimation of Rényi Divergences","authors":"Jeremiah Birrell, P. Dupuis, M. Katsoulakis, L. Rey-Bellet, Jie Wang","doi":"10.1137/20m1368926","DOIUrl":"https://doi.org/10.1137/20m1368926","url":null,"abstract":"We derive a new variational formula for the R{e}nyi family of divergences, $R_alpha(Q|P)$, between probability measures $Q$ and $P$. Our result generalizes the classical Donsker-Varadhan variational formula for the Kullback-Leibler divergence. We further show that this R{e}nyi variational formula holds over a range of function spaces; this leads to a formula for the optimizer under very weak assumptions and is also key in our development of a consistency theory for R{e}nyi divergence estimators. By applying this theory to neural network estimators, we show that if a neural network family satisfies one of several strengthened versions of the universal approximation property then the corresponding R{e}nyi divergence estimator is consistent. In contrast to likelihood-ratio based methods, our estimators involve only expectations under $Q$ and $P$ and hence are more effective in high dimensional systems. We illustrate this via several numerical examples of neural network estimation in systems of up to 5000 dimensions.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"6 1","pages":"1093-1116"},"PeriodicalIF":0.0,"publicationDate":"2020-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80066195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
The Signature Kernel Is the Solution of a Goursat PDE 签名核是一个Goursat PDE的解
Q1 MATHEMATICS, APPLIED Pub Date : 2020-06-26 DOI: 10.1137/20M1366794
C. Salvi, Thomas Cass, J. Foster, Terry Lyons, Weixin Yang
Recently, there has been an increased interest in the development of kernel methods for learning with sequential data. The signature kernel is a learning tool with potential to handle irregularly sampled, multivariate time series. In"Kernels for sequentially ordered data"the authors introduced a kernel trick for the truncated version of this kernel avoiding the exponential complexity that would have been involved in a direct computation. Here we show that for continuously differentiable paths, the signature kernel solves a hyperbolic PDE and recognize the connection with a well known class of differential equations known in the literature as Goursat problems. This Goursat PDE only depends on the increments of the input sequences, does not require the explicit computation of signatures and can be solved efficiently using state-of-the-arthyperbolic PDE numerical solvers, giving a kernel trick for the untruncated signature kernel, with the same raw complexity as the method from"Kernels for sequentially ordered data", but with the advantage that the PDE numerical scheme is well suited for GPU parallelization, which effectively reduces the complexity by a full order of magnitude in the length of the input sequences. In addition, we extend the previous analysis to the space of geometric rough paths and establish, using classical results from rough path theory, that the rough version of the signature kernel solves a rough integral equation analogous to the aforementioned Goursat PDE. Finally, we empirically demonstrate the effectiveness of our PDE kernel as a machine learning tool in various machine learning applications dealing with sequential data. We release the library sigkernel publicly available at https://github.com/crispitagorico/sigkernel.
最近,人们对开发用于序列数据学习的核方法越来越感兴趣。签名核是一种学习工具,具有处理不规则采样、多变量时间序列的潜力。在“顺序排序数据的内核”一文中,作者介绍了这个内核的截断版本的一个内核技巧,避免了直接计算中可能涉及的指数级复杂性。在这里,我们证明了对于连续可微路径,签名核解决了一个双曲PDE,并识别了与文献中称为Goursat问题的一类众所周知的微分方程的联系。此Goursat PDE仅依赖于输入序列的增量,不需要显式计算签名,并且可以使用状态- arthybolic PDE数值解算器有效地求解,为未截断的签名内核提供了一个内核技巧,具有与“序列有序数据的内核”方法相同的原始复杂性,但具有PDE数值方案非常适合GPU并行化的优点。它有效地将输入序列的复杂度降低了整整一个数量级。此外,我们将之前的分析扩展到几何粗糙路径空间,并利用粗糙路径理论的经典结果,建立了签名核的粗糙版本求解类似于上述Goursat PDE的粗糙积分方程。最后,我们通过经验证明了PDE内核在处理顺序数据的各种机器学习应用程序中作为机器学习工具的有效性。我们在https://github.com/crispitagorico/sigkernel公开发布了sigkernel库。
{"title":"The Signature Kernel Is the Solution of a Goursat PDE","authors":"C. Salvi, Thomas Cass, J. Foster, Terry Lyons, Weixin Yang","doi":"10.1137/20M1366794","DOIUrl":"https://doi.org/10.1137/20M1366794","url":null,"abstract":"Recently, there has been an increased interest in the development of kernel methods for learning with sequential data. The signature kernel is a learning tool with potential to handle irregularly sampled, multivariate time series. In\"Kernels for sequentially ordered data\"the authors introduced a kernel trick for the truncated version of this kernel avoiding the exponential complexity that would have been involved in a direct computation. Here we show that for continuously differentiable paths, the signature kernel solves a hyperbolic PDE and recognize the connection with a well known class of differential equations known in the literature as Goursat problems. This Goursat PDE only depends on the increments of the input sequences, does not require the explicit computation of signatures and can be solved efficiently using state-of-the-arthyperbolic PDE numerical solvers, giving a kernel trick for the untruncated signature kernel, with the same raw complexity as the method from\"Kernels for sequentially ordered data\", but with the advantage that the PDE numerical scheme is well suited for GPU parallelization, which effectively reduces the complexity by a full order of magnitude in the length of the input sequences. In addition, we extend the previous analysis to the space of geometric rough paths and establish, using classical results from rough path theory, that the rough version of the signature kernel solves a rough integral equation analogous to the aforementioned Goursat PDE. Finally, we empirically demonstrate the effectiveness of our PDE kernel as a machine learning tool in various machine learning applications dealing with sequential data. We release the library sigkernel publicly available at https://github.com/crispitagorico/sigkernel.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"15 1","pages":"873-899"},"PeriodicalIF":0.0,"publicationDate":"2020-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80665343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Memory-Efficient Structured Convex Optimization via Extreme Point Sampling 基于极值点抽样的高效内存结构凸优化
Q1 MATHEMATICS, APPLIED Pub Date : 2020-06-19 DOI: 10.1137/20m1358037
Nimita Shinde, Vishnu Narayanan, J. Saunderson
Memory is a key computational bottleneck when solving large-scale convex optimization problems such as semidefinite programs (SDPs). In this paper, we focus on the regime in which storing an $ntimes n$ matrix decision variable is prohibitive. To solve SDPs in this regime, we develop a randomized algorithm that returns a random vector whose covariance matrix is near-feasible and near-optimal for the SDP. We show how to develop such an algorithm by modifying the Frank-Wolfe algorithm to systematically replace the matrix iterates with random vectors. As an application of this approach, we show how to implement the Goemans-Williamson approximation algorithm for textsc{MaxCut} using $mathcal{O}(n)$ memory in addition to the memory required to store the problem instance. We then extend our approach to deal with a broader range of structured convex optimization problems, replacing decision variables with random extreme points of the feasible region.
在求解半定规划等大规模凸优化问题时,内存是一个关键的计算瓶颈。在本文中,我们关注存储$ntimes n$矩阵决策变量是禁止的情况。为了解决这种情况下的SDP,我们开发了一种随机算法,该算法返回一个随机向量,其协方差矩阵对于SDP近似可行且近似最优。我们展示了如何通过修改Frank-Wolfe算法来系统地用随机向量替换矩阵迭代来开发这样的算法。作为这种方法的一个应用,我们将展示如何使用$mathcal{O}(n)$内存以及存储问题实例所需的内存来实现textsc{MaxCut}的Goemans-Williamson近似算法。然后,我们扩展了我们的方法来处理更广泛的结构化凸优化问题,用可行域的随机极值点代替决策变量。
{"title":"Memory-Efficient Structured Convex Optimization via Extreme Point Sampling","authors":"Nimita Shinde, Vishnu Narayanan, J. Saunderson","doi":"10.1137/20m1358037","DOIUrl":"https://doi.org/10.1137/20m1358037","url":null,"abstract":"Memory is a key computational bottleneck when solving large-scale convex optimization problems such as semidefinite programs (SDPs). In this paper, we focus on the regime in which storing an $ntimes n$ matrix decision variable is prohibitive. To solve SDPs in this regime, we develop a randomized algorithm that returns a random vector whose covariance matrix is near-feasible and near-optimal for the SDP. We show how to develop such an algorithm by modifying the Frank-Wolfe algorithm to systematically replace the matrix iterates with random vectors. As an application of this approach, we show how to implement the Goemans-Williamson approximation algorithm for textsc{MaxCut} using $mathcal{O}(n)$ memory in addition to the memory required to store the problem instance. We then extend our approach to deal with a broader range of structured convex optimization problems, replacing decision variables with random extreme points of the feasible region.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"51 1","pages":"787-814"},"PeriodicalIF":0.0,"publicationDate":"2020-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90346011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Two Steps at a Time---Taking GAN Training in Stride with Tseng's Method 一次两步——用曾氏方法进行GAN训练
Q1 MATHEMATICS, APPLIED Pub Date : 2020-06-16 DOI: 10.1137/21m1420939
A. Böhm, Michael Sedlmayer, E. R. Csetnek, R. Boț
Motivated by the training of Generative Adversarial Networks (GANs), we study methods for solving minimax problems with additional nonsmooth regularizers. We do so by employing emph{monotone operator} theory, in particular the emph{Forward-Backward-Forward (FBF)} method, which avoids the known issue of limit cycling by correcting each update by a second gradient evaluation. Furthermore, we propose a seemingly new scheme which recycles old gradients to mitigate the additional computational cost. In doing so we rediscover a known method, related to emph{Optimistic Gradient Descent Ascent (OGDA)}. For both schemes we prove novel convergence rates for convex-concave minimax problems via a unifying approach. The derived error bounds are in terms of the gap function for the ergodic iterates. For the deterministic and the stochastic problem we show a convergence rate of $mathcal{O}(1/k)$ and $mathcal{O}(1/sqrt{k})$, respectively. We complement our theoretical results with empirical improvements in the training of Wasserstein GANs on the CIFAR10 dataset.
受生成对抗网络(GANs)训练的启发,我们研究了带有附加非光滑正则器的极大极小问题的求解方法。我们通过使用emph{单调算子}理论,特别是emph{前-后-前(FBF)}方法来做到这一点,该方法通过第二次梯度评估来纠正每次更新,从而避免了已知的极限循环问题。此外,我们提出了一个看似新的方案,回收旧的梯度,以减少额外的计算成本。在此过程中,我们重新发现了一种已知的方法,与emph{乐观梯度下降上升(OGDA)}相关。对于这两种方案,我们通过统一的方法证明了凸凹极小极大问题的新的收敛速率。导出的误差边界是根据遍历迭代的间隙函数。对于确定性问题和随机问题,我们分别给出了$mathcal{O}(1/k)$和$mathcal{O}(1/sqrt{k})$的收敛率。我们通过在CIFAR10数据集上训练Wasserstein gan的经验改进来补充我们的理论结果。
{"title":"Two Steps at a Time---Taking GAN Training in Stride with Tseng's Method","authors":"A. Böhm, Michael Sedlmayer, E. R. Csetnek, R. Boț","doi":"10.1137/21m1420939","DOIUrl":"https://doi.org/10.1137/21m1420939","url":null,"abstract":"Motivated by the training of Generative Adversarial Networks (GANs), we study methods for solving minimax problems with additional nonsmooth regularizers. We do so by employing emph{monotone operator} theory, in particular the emph{Forward-Backward-Forward (FBF)} method, which avoids the known issue of limit cycling by correcting each update by a second gradient evaluation. Furthermore, we propose a seemingly new scheme which recycles old gradients to mitigate the additional computational cost. In doing so we rediscover a known method, related to emph{Optimistic Gradient Descent Ascent (OGDA)}. For both schemes we prove novel convergence rates for convex-concave minimax problems via a unifying approach. The derived error bounds are in terms of the gap function for the ergodic iterates. For the deterministic and the stochastic problem we show a convergence rate of $mathcal{O}(1/k)$ and $mathcal{O}(1/sqrt{k})$, respectively. We complement our theoretical results with empirical improvements in the training of Wasserstein GANs on the CIFAR10 dataset.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44505523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
SIAM journal on mathematics of data science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1