首页 > 最新文献

SIAM journal on mathematics of data science最新文献

英文 中文
FANOK: Knockoffs in Linear Time FANOK:线性时间的仿冒品
Q1 MATHEMATICS, APPLIED Pub Date : 2020-06-15 DOI: 10.1137/20m1363698
Armin Askari, Quentin Rebjock, A. d’Aspremont, L. Ghaoui
We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large scale feature selection problems. Identifying the knockoff distribution requires solving a large scale semidefinite program for which we derive several efficient methods. One handles generic covariance matrices, has a complexity scaling as $O(p^3)$ where $p$ is the ambient dimension, while another assumes a rank $k$ factor model on the covariance matrix to reduce this complexity bound to $O(pk^2)$. We also derive efficient procedures to both estimate factor models and sample knockoff covariates with complexity linear in the dimension. We test our methods on problems with $p$ as large as $500,000$.
我们描述了一系列有效实现高斯模型x仿制品的算法,以控制大规模特征选择问题的错误发现率。识别仿冒品的分布需要求解一个大规模的半定程序,为此我们推导了几种有效的方法。一个处理一般的协方差矩阵,其复杂度缩放为$O(p^3)$,其中$p$是环境维度,而另一个在协方差矩阵上假设秩$k$因子模型,以将复杂度降低到$O(pk^2)$。我们还推导了估算因子模型和样本仿制品协变量的有效方法,这些协变量的复杂度在维度上是线性的。我们在$p$高达$500,000$的问题上测试我们的方法。
{"title":"FANOK: Knockoffs in Linear Time","authors":"Armin Askari, Quentin Rebjock, A. d’Aspremont, L. Ghaoui","doi":"10.1137/20m1363698","DOIUrl":"https://doi.org/10.1137/20m1363698","url":null,"abstract":"We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large scale feature selection problems. Identifying the knockoff distribution requires solving a large scale semidefinite program for which we derive several efficient methods. One handles generic covariance matrices, has a complexity scaling as $O(p^3)$ where $p$ is the ambient dimension, while another assumes a rank $k$ factor model on the covariance matrix to reduce this complexity bound to $O(pk^2)$. We also derive efficient procedures to both estimate factor models and sample knockoff covariates with complexity linear in the dimension. We test our methods on problems with $p$ as large as $500,000$.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"22 1","pages":"833-853"},"PeriodicalIF":0.0,"publicationDate":"2020-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86856742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Overparameterization and Generalization Error: Weighted Trigonometric Interpolation 超参数化与广义误差:加权三角插值
Q1 MATHEMATICS, APPLIED Pub Date : 2020-06-15 DOI: 10.1137/21m1390955
Yuege Xie, H. Chou, H. Rauhut, Rachel A. Ward
Motivated by surprisingly good generalization properties of learned deep neural networks in overparameterized scenarios and by the related double descent phenomenon, this paper analyzes the relation between smoothness and low generalization error in an overparameterized linear learning problem. We study a random Fourier series model, where the task is to estimate the unknown Fourier coefficients from equidistant samples. We derive exact expressions for the generalization error of both plain and weighted least squares estimators. We show precisely how a bias towards smooth interpolants, in the form of weighted trigonometric interpolation, can lead to smaller generalization error in the overparameterized regime compared to the underparameterized regime. This provides insight into the power of overparameterization, which is common in modern machine learning.
受学习深度神经网络在过参数化场景中令人惊讶的良好泛化特性以及相关的双下降现象的启发,本文分析了过参数化线性学习问题中平滑度与低泛化误差之间的关系。我们研究了一个随机傅立叶级数模型,其中的任务是从等距样本中估计未知的傅立叶系数。我们导出了素最小二乘估计量和加权最小二乘估计量的推广误差的精确表达式。我们精确地展示了加权三角插值形式的平滑插值的偏差如何在参数过高的情况下比参数过低的情况下导致更小的泛化误差。这让我们深入了解了过度参数化的力量,这在现代机器学习中很常见。
{"title":"Overparameterization and Generalization Error: Weighted Trigonometric Interpolation","authors":"Yuege Xie, H. Chou, H. Rauhut, Rachel A. Ward","doi":"10.1137/21m1390955","DOIUrl":"https://doi.org/10.1137/21m1390955","url":null,"abstract":"Motivated by surprisingly good generalization properties of learned deep neural networks in overparameterized scenarios and by the related double descent phenomenon, this paper analyzes the relation between smoothness and low generalization error in an overparameterized linear learning problem. We study a random Fourier series model, where the task is to estimate the unknown Fourier coefficients from equidistant samples. We derive exact expressions for the generalization error of both plain and weighted least squares estimators. We show precisely how a bias towards smooth interpolants, in the form of weighted trigonometric interpolation, can lead to smaller generalization error in the overparameterized regime compared to the underparameterized regime. This provides insight into the power of overparameterization, which is common in modern machine learning.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47917183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The Trimmed Lasso: Sparse Recovery Guarantees and Practical Optimization by the Generalized Soft-Min Penalty 修剪套索:稀疏恢复保证和广义软最小惩罚的实际优化
Q1 MATHEMATICS, APPLIED Pub Date : 2020-05-18 DOI: 10.1137/20M1330634
Tal Amir, R. Basri, B. Nadler
We present a new approach to solve the sparse approximation or best subset selection problem, namely find a $k$-sparse vector ${bf x}inmathbb{R}^d$ that minimizes the $ell_2$ residual $lVert A{bf x}-{bf y} rVert_2$. We consider a regularized approach, whereby this residual is penalized by the non-convex $textit{trimmed lasso}$, defined as the $ell_1$-norm of ${bf x}$ excluding its $k$ largest-magnitude entries. We prove that the trimmed lasso has several appealing theoretical properties, and in particular derive sparse recovery guarantees assuming successful optimization of the penalized objective. Next, we show empirically that directly optimizing this objective can be quite challenging. Instead, we propose a surrogate for the trimmed lasso, called the $textit{generalized soft-min}$. This penalty smoothly interpolates between the classical lasso and the trimmed lasso, while taking into account all possible $k$-sparse patterns. The generalized soft-min penalty involves summation over $binom{d}{k}$ terms, yet we derive a polynomial-time algorithm to compute it. This, in turn, yields a practical method for the original sparse approximation problem. Via simulations, we demonstrate its competitive performance compared to current state of the art.
我们提出了一种新的方法来解决稀疏逼近或最佳子集选择问题,即找到一个最小化$ell_2$残差$lVert A{bf x}-{bf y} rVert_2$的$k$ -稀疏向量${bf x}inmathbb{R}^d$。我们考虑一种正则化的方法,其中这个残差被非凸$textit{trimmed lasso}$惩罚,定义为${bf x}$的$ell_1$ -范数,不包括其$k$最大的项。我们证明了修剪套索具有几个吸引人的理论性质,特别是在惩罚目标成功优化的前提下,推导了稀疏恢复保证。接下来,我们将通过经验证明,直接优化这一目标非常具有挑战性。相反,我们建议为修剪过的套索提供一个代理,称为$textit{generalized soft-min}$。这个惩罚平滑地在经典套索和裁剪套索之间插入,同时考虑到所有可能的$k$ -稀疏模式。广义软最小惩罚涉及$binom{d}{k}$项的求和,但我们推导了一个多项式时间算法来计算它。这反过来又为原始稀疏逼近问题提供了一种实用的方法。通过模拟,我们展示了与当前最先进的技术相比,其具有竞争力的性能。
{"title":"The Trimmed Lasso: Sparse Recovery Guarantees and Practical Optimization by the Generalized Soft-Min Penalty","authors":"Tal Amir, R. Basri, B. Nadler","doi":"10.1137/20M1330634","DOIUrl":"https://doi.org/10.1137/20M1330634","url":null,"abstract":"We present a new approach to solve the sparse approximation or best subset selection problem, namely find a $k$-sparse vector ${bf x}inmathbb{R}^d$ that minimizes the $ell_2$ residual $lVert A{bf x}-{bf y} rVert_2$. We consider a regularized approach, whereby this residual is penalized by the non-convex $textit{trimmed lasso}$, defined as the $ell_1$-norm of ${bf x}$ excluding its $k$ largest-magnitude entries. We prove that the trimmed lasso has several appealing theoretical properties, and in particular derive sparse recovery guarantees assuming successful optimization of the penalized objective. Next, we show empirically that directly optimizing this objective can be quite challenging. Instead, we propose a surrogate for the trimmed lasso, called the $textit{generalized soft-min}$. This penalty smoothly interpolates between the classical lasso and the trimmed lasso, while taking into account all possible $k$-sparse patterns. The generalized soft-min penalty involves summation over $binom{d}{k}$ terms, yet we derive a polynomial-time algorithm to compute it. This, in turn, yields a practical method for the original sparse approximation problem. Via simulations, we demonstrate its competitive performance compared to current state of the art.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"47 10 1","pages":"900-929"},"PeriodicalIF":0.0,"publicationDate":"2020-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80619433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Spectral Discovery of Jointly Smooth Features for Multimodal Data 多模态数据联合光滑特征的光谱发现
Q1 MATHEMATICS, APPLIED Pub Date : 2020-04-09 DOI: 10.1137/21M141590X
Or Yair, Felix Dietrich, Rotem Mulayoff, R. Talmon, I. Kevrekidis
In this paper, we propose a spectral method for deriving functions that are jointly smooth on multiple observed manifolds. Our method is unsupervised and primarily consists of two steps. First, using kernels, we obtain a subspace spanning smooth functions on each manifold. Then, we apply a spectral method to the obtained subspaces and discover functions that are jointly smooth on all manifolds. We show analytically that our method is guaranteed to provide a set of orthogonal functions that are as jointly smooth as possible, ordered from the smoothest to the least smooth. In addition, we show that the proposed method can be efficiently extended to unseen data using the Nystrom method. We demonstrate the proposed method on both simulated and real measured data and compare the results to nonlinear variants of the seminal Canonical Correlation Analysis (CCA). Particularly, we show superior results for sleep stage identification. In addition, we show how the proposed method can be leveraged for finding minimal realizations of parameter spaces of nonlinear dynamical systems.
在本文中,我们提出了一种谱方法来推导在多个观测流形上联合光滑的函数。我们的方法是无监督的,主要包括两个步骤。首先,利用核,我们得到了在每个流形上生成光滑函数的子空间。然后,我们对得到的子空间应用谱方法,发现在所有流形上联合光滑的函数。我们解析地证明了我们的方法保证提供一组尽可能联合光滑的正交函数,从光滑到最不光滑排序。此外,我们还证明了该方法可以有效地扩展到使用Nystrom方法的未知数据。我们在模拟和实际测量数据上验证了所提出的方法,并将结果与经典相关分析(CCA)的非线性变量进行了比较。特别是,我们在睡眠阶段识别方面取得了优异的结果。此外,我们还展示了如何利用所提出的方法来寻找非线性动力系统参数空间的最小实现。
{"title":"Spectral Discovery of Jointly Smooth Features for Multimodal Data","authors":"Or Yair, Felix Dietrich, Rotem Mulayoff, R. Talmon, I. Kevrekidis","doi":"10.1137/21M141590X","DOIUrl":"https://doi.org/10.1137/21M141590X","url":null,"abstract":"In this paper, we propose a spectral method for deriving functions that are jointly smooth on multiple observed manifolds. Our method is unsupervised and primarily consists of two steps. First, using kernels, we obtain a subspace spanning smooth functions on each manifold. Then, we apply a spectral method to the obtained subspaces and discover functions that are jointly smooth on all manifolds. We show analytically that our method is guaranteed to provide a set of orthogonal functions that are as jointly smooth as possible, ordered from the smoothest to the least smooth. In addition, we show that the proposed method can be efficiently extended to unseen data using the Nystrom method. We demonstrate the proposed method on both simulated and real measured data and compare the results to nonlinear variants of the seminal Canonical Correlation Analysis (CCA). Particularly, we show superior results for sleep stage identification. In addition, we show how the proposed method can be leveraged for finding minimal realizations of parameter spaces of nonlinear dynamical systems.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"1 1","pages":"410-430"},"PeriodicalIF":0.0,"publicationDate":"2020-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90987478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis 时间差异学习是最优的吗?依赖实例的分析
Q1 MATHEMATICS, APPLIED Pub Date : 2020-03-16 DOI: 10.1137/20m1331524
K. Khamaru, A. Pananjady, Feng Ruan, M. Wainwright, Michael I. Jordan
We address the problem of policy evaluation in discounted Markov decision processes, and provide instance-dependent guarantees on the $ell_infty$-error under a generative model. We establish both asymptotic and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms. Theory-inspired simulations show that the widely-used temporal difference (TD) algorithm is strictly suboptimal when evaluated in a non-asymptotic setting, even when combined with Polyak-Ruppert iterate averaging. We remedy this issue by introducing and analyzing variance-reduced forms of stochastic approximation, showing that they achieve non-asymptotic, instance-dependent optimality up to logarithmic factors.
我们解决了贴现马尔可夫决策过程中的策略评估问题,并在生成模型下提供了对$ell_infty$ -误差的实例依赖保证。我们建立了策略评估的局部极大极小下界的渐近和非渐近版本,从而提供了一个实例相关的基线来比较算法。理论启发的模拟表明,当在非渐近设置中评估时,广泛使用的时间差分(TD)算法是严格次优的,即使与Polyak-Ruppert迭代平均相结合。我们通过引入和分析方差减少形式的随机逼近来解决这个问题,表明它们达到非渐近的、实例相关的最优性,直至对数因子。
{"title":"Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis","authors":"K. Khamaru, A. Pananjady, Feng Ruan, M. Wainwright, Michael I. Jordan","doi":"10.1137/20m1331524","DOIUrl":"https://doi.org/10.1137/20m1331524","url":null,"abstract":"We address the problem of policy evaluation in discounted Markov decision processes, and provide instance-dependent guarantees on the $ell_infty$-error under a generative model. We establish both asymptotic and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms. Theory-inspired simulations show that the widely-used temporal difference (TD) algorithm is strictly suboptimal when evaluated in a non-asymptotic setting, even when combined with Polyak-Ruppert iterate averaging. We remedy this issue by introducing and analyzing variance-reduced forms of stochastic approximation, showing that they achieve non-asymptotic, instance-dependent optimality up to logarithmic factors.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"1 1","pages":"1013-1040"},"PeriodicalIF":0.0,"publicationDate":"2020-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73570067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Diffusion State Distances: Multitemporal Analysis, Fast Algorithms, and Applications to Biological Networks 扩散状态距离:多时间分析、快速算法及在生物网络中的应用
Q1 MATHEMATICS, APPLIED Pub Date : 2020-03-07 DOI: 10.1137/20M1324089
L. Cowen, K. Devkota, Xiaozhe Hu, James M. Murphy, Kaiyi Wu
Data-dependent metrics are powerful tools for learning the underlying structure of high-dimensional data. This article develops and analyzes a data-dependent metric known as diffusion state distance (DSD), which compares points using a data-driven diffusion process. Unlike related diffusion methods, DSDs incorporate information across time scales, which allows for the intrinsic data structure to be inferred in a parameter-free manner. This article develops a theory for DSD based on the multitemporal emergence of mesoscopic equilibria in the underlying diffusion process. New algorithms for denoising and dimension reduction with DSD are also proposed and analyzed. These approaches are based on a weighted spectral decomposition of the underlying diffusion process, and experiments on synthetic datasets and real biological networks illustrate the efficacy of the proposed algorithms in terms of both speed and accuracy. Throughout, comparisons with related methods are made, in order to illustrate the distinct advantages of DSD for datasets exhibiting multiscale structure.
数据相关度量是学习高维数据底层结构的强大工具。本文开发并分析了一个依赖于数据的度量,称为扩散状态距离(DSD),它使用数据驱动的扩散过程来比较点。与相关的扩散方法不同,dsd包含跨时间尺度的信息,这允许以无参数的方式推断内在数据结构。本文发展了一种基于扩散过程中介观平衡的多时间出现的DSD理论。提出并分析了基于DSD的去噪和降维的新算法。这些方法基于潜在扩散过程的加权谱分解,在合成数据集和真实生物网络上的实验表明,所提出的算法在速度和准确性方面都是有效的。在整个过程中,与相关方法进行了比较,以说明DSD在具有多尺度结构的数据集上的独特优势。
{"title":"Diffusion State Distances: Multitemporal Analysis, Fast Algorithms, and Applications to Biological Networks","authors":"L. Cowen, K. Devkota, Xiaozhe Hu, James M. Murphy, Kaiyi Wu","doi":"10.1137/20M1324089","DOIUrl":"https://doi.org/10.1137/20M1324089","url":null,"abstract":"Data-dependent metrics are powerful tools for learning the underlying structure of high-dimensional data. This article develops and analyzes a data-dependent metric known as diffusion state distance (DSD), which compares points using a data-driven diffusion process. Unlike related diffusion methods, DSDs incorporate information across time scales, which allows for the intrinsic data structure to be inferred in a parameter-free manner. This article develops a theory for DSD based on the multitemporal emergence of mesoscopic equilibria in the underlying diffusion process. New algorithms for denoising and dimension reduction with DSD are also proposed and analyzed. These approaches are based on a weighted spectral decomposition of the underlying diffusion process, and experiments on synthetic datasets and real biological networks illustrate the efficacy of the proposed algorithms in terms of both speed and accuracy. Throughout, comparisons with related methods are made, in order to illustrate the distinct advantages of DSD for datasets exhibiting multiscale structure.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"32 1","pages":"142-170"},"PeriodicalIF":0.0,"publicationDate":"2020-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80648699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Diversity sampling is an implicit regularization for kernel methods 多样性采样是核方法的隐式正则化
Q1 MATHEMATICS, APPLIED Pub Date : 2020-02-20 DOI: 10.1137/20M1320031
M. Fanuel, J. Schreurs, J. Suykens
Kernel methods have achieved very good performance on large scale regression and classification problems, by using the Nystrom method and preconditioning techniques. The Nystrom approximation -- based on a subset of landmarks -- gives a low rank approximation of the kernel matrix, and is known to provide a form of implicit regularization. We further elaborate on the impact of sampling diverse landmarks for constructing the Nystrom approximation in supervised as well as unsupervised kernel methods. By using Determinantal Point Processes for sampling, we obtain additional theoretical results concerning the interplay between diversity and regularization. Empirically, we demonstrate the advantages of training kernel methods based on subsets made of diverse points. In particular, if the dataset has a dense bulk and a sparser tail, we show that Nystrom kernel regression with diverse landmarks increases the accuracy of the regression in sparser regions of the dataset, with respect to a uniform landmark sampling. A greedy heuristic is also proposed to select diverse samples of significant size within large datasets when exact DPP sampling is not practically feasible.
核方法通过使用Nystrom方法和预处理技术,在大规模回归和分类问题上取得了很好的性能。Nystrom近似——基于一个里程碑子集——给出了核矩阵的低秩近似,并且已知提供了一种隐式正则化形式。我们进一步阐述了采样不同的地标对在监督和无监督核方法中构造Nystrom近似的影响。通过使用确定性点过程进行采样,我们获得了关于多样性和正则化之间相互作用的额外理论结果。从经验上,我们证明了基于由不同点组成的子集的训练核方法的优点。特别是,如果数据集具有密集的主体和稀疏的尾部,我们表明,相对于统一的地标采样,具有不同地标的Nystrom核回归增加了数据集稀疏区域回归的准确性。当精确的DPP抽样实际上不可行时,还提出了一种贪婪启发式方法来选择大数据集中显著大小的不同样本。
{"title":"Diversity sampling is an implicit regularization for kernel methods","authors":"M. Fanuel, J. Schreurs, J. Suykens","doi":"10.1137/20M1320031","DOIUrl":"https://doi.org/10.1137/20M1320031","url":null,"abstract":"Kernel methods have achieved very good performance on large scale regression and classification problems, by using the Nystrom method and preconditioning techniques. The Nystrom approximation -- based on a subset of landmarks -- gives a low rank approximation of the kernel matrix, and is known to provide a form of implicit regularization. We further elaborate on the impact of sampling diverse landmarks for constructing the Nystrom approximation in supervised as well as unsupervised kernel methods. By using Determinantal Point Processes for sampling, we obtain additional theoretical results concerning the interplay between diversity and regularization. Empirically, we demonstrate the advantages of training kernel methods based on subsets made of diverse points. In particular, if the dataset has a dense bulk and a sparser tail, we show that Nystrom kernel regression with diverse landmarks increases the accuracy of the regression in sparser regions of the dataset, with respect to a uniform landmark sampling. A greedy heuristic is also proposed to select diverse samples of significant size within large datasets when exact DPP sampling is not practically feasible.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"9 1","pages":"280-297"},"PeriodicalIF":0.0,"publicationDate":"2020-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85923398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization 随机梯度法在非凸优化中的自适应性
Q1 MATHEMATICS, APPLIED Pub Date : 2020-02-13 DOI: 10.1137/21m1394308
Samuel Horváth, Lihua Lei, Peter Richtárik, Michael I. Jordan
Adaptivity is an important yet under-studied property in modern optimization theory. The gap between the state-of-the-art theory and the current practice is striking in that algorithms with desirable theoretical guarantees typically involve drastically different settings of hyperparameters, such as step-size schemes and batch sizes, in different regimes. Despite the appealing theoretical results, such divisive strategies provide little, if any, insight to practitioners to select algorithms that work broadly without tweaking the hyperparameters. In this work, blending the "geometrization" technique introduced by Lei & Jordan 2016 and the texttt{SARAH} algorithm of Nguyen et al., 2017, we propose the Geometrized texttt{SARAH} algorithm for non-convex finite-sum and stochastic optimization. Our algorithm is proved to achieve adaptivity to both the magnitude of the target accuracy and the Polyak-Łojasiewicz (PL) constant if present. In addition, it achieves the best-available convergence rate for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
自适应是现代优化理论中一个重要的但尚未得到充分研究的性质。最先进的理论与当前实践之间的差距是惊人的,因为具有理想理论保证的算法通常涉及不同制度下超参数的完全不同设置,例如步长方案和批大小。尽管有吸引人的理论结果,这种分裂的策略提供了很少的洞察力,如果有的话,从业者选择广泛工作的算法而不调整超参数。在这项工作中,我们将Lei & Jordan 2016引入的“几何化”技术和Nguyen et al., 2017的texttt{莎拉}算法相结合,提出了用于非凸有限和随机优化的几何化texttt{莎拉}算法。我们的算法被证明对目标精度的大小和Polyak-Łojasiewicz (PL)常数都有自适应性。此外,它在非PL目标的同时实现了最佳可用收敛率,同时优于现有的PL目标算法。
{"title":"Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization","authors":"Samuel Horváth, Lihua Lei, Peter Richtárik, Michael I. Jordan","doi":"10.1137/21m1394308","DOIUrl":"https://doi.org/10.1137/21m1394308","url":null,"abstract":"Adaptivity is an important yet under-studied property in modern optimization theory. The gap between the state-of-the-art theory and the current practice is striking in that algorithms with desirable theoretical guarantees typically involve drastically different settings of hyperparameters, such as step-size schemes and batch sizes, in different regimes. Despite the appealing theoretical results, such divisive strategies provide little, if any, insight to practitioners to select algorithms that work broadly without tweaking the hyperparameters. In this work, blending the \"geometrization\" technique introduced by Lei & Jordan 2016 and the texttt{SARAH} algorithm of Nguyen et al., 2017, we propose the Geometrized texttt{SARAH} algorithm for non-convex finite-sum and stochastic optimization. Our algorithm is proved to achieve adaptivity to both the magnitude of the target accuracy and the Polyak-Łojasiewicz (PL) constant if present. In addition, it achieves the best-available convergence rate for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"236 1","pages":"634-648"},"PeriodicalIF":0.0,"publicationDate":"2020-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76432428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Mean-Field Controls with Q-Learning for Cooperative MARL: Convergence and Complexity Analysis 协同MARL的q -学习平均域控制:收敛性和复杂性分析
Q1 MATHEMATICS, APPLIED Pub Date : 2020-02-10 DOI: 10.1137/20m1360700
Haotian Gu, Xin Guo, Xiaoli Wei, Renyuan Xu
Multi-agent reinforcement learning (MARL), despite its popularity and empirical success, suffers from the curse of dimensionality. This paper builds the mathematical framework to approximate cooperative MARL by a mean-field control (MFC) framework, and shows that the approximation error is of $O(frac{1}{sqrt{N}})$. By establishing appropriate form of the dynamic programming principle for both the value function and the Q function, it proposes a model-free kernel-based Q-learning algorithm (MFC-K-Q), which is shown to be of linear convergence rate, the first of its kind in the MARL literature. It further establishes that the convergence rate and the sample complexity of MFC-K-Q are independent of the number of agents $N$. Empirical studies for the network traffic congestion problem demonstrate that MFC-K-Q outperforms existing MARL algorithms when $N$ is large, for instance when $N>50$.
多智能体强化学习(MARL)尽管在实践中取得了成功,但也受到了维数的困扰。本文建立了用平均场控制(MFC)框架近似协同MARL的数学框架,并证明了近似误差为$O(frac{1}{sqrt{N}})$。通过建立值函数和Q函数的动态规划原理的适当形式,提出了一种基于无模型核的Q-学习算法(MFC-K-Q),该算法具有线性收敛速率,在MARL文献中尚属首次。进一步证明了MFC-K-Q的收敛速度和样本复杂度与agent数量无关$N$。针对网络流量拥塞问题的实证研究表明,当$N$较大时,例如$N>50$时,MFC-K-Q优于现有的MARL算法。
{"title":"Mean-Field Controls with Q-Learning for Cooperative MARL: Convergence and Complexity Analysis","authors":"Haotian Gu, Xin Guo, Xiaoli Wei, Renyuan Xu","doi":"10.1137/20m1360700","DOIUrl":"https://doi.org/10.1137/20m1360700","url":null,"abstract":"Multi-agent reinforcement learning (MARL), despite its popularity and empirical success, suffers from the curse of dimensionality. This paper builds the mathematical framework to approximate cooperative MARL by a mean-field control (MFC) framework, and shows that the approximation error is of $O(frac{1}{sqrt{N}})$. By establishing appropriate form of the dynamic programming principle for both the value function and the Q function, it proposes a model-free kernel-based Q-learning algorithm (MFC-K-Q), which is shown to be of linear convergence rate, the first of its kind in the MARL literature. It further establishes that the convergence rate and the sample complexity of MFC-K-Q are independent of the number of agents $N$. Empirical studies for the network traffic congestion problem demonstrate that MFC-K-Q outperforms existing MARL algorithms when $N$ is large, for instance when $N>50$.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"20 1","pages":"1168-1196"},"PeriodicalIF":0.0,"publicationDate":"2020-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74556016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Persistent Cohomology for Data With Multicomponent Heterogeneous Information. 多分量异构信息数据的持久上同调。
Q1 MATHEMATICS, APPLIED Pub Date : 2020-01-01 Epub Date: 2020-05-19 DOI: 10.1137/19m1272226
Zixuan Cang, Guo-Wei Wei

Persistent homology is a powerful tool for characterizing the topology of a data set at various geometric scales. When applied to the description of molecular structures, persistent homology can capture the multiscale geometric features and reveal certain interaction patterns in terms of topological invariants. However, in addition to the geometric information, there is a wide variety of nongeometric information of molecular structures, such as element types, atomic partial charges, atomic pairwise interactions, and electrostatic potential functions, that is not described by persistent homology. Although element-specific homology and electrostatic persistent homology can encode some nongeometric information into geometry based topological invariants, it is desirable to have a mathematical paradigm to systematically embed both geometric and nongeometric information, i.e., multicomponent heterogeneous information, into unified topological representations. To this end, we propose a persistent cohomology based framework for the enriched representation of data. In our framework, nongeometric information can either be distributed globally or reside locally on the datasets in the geometric sense and can be properly defined on topological spaces, i.e., simplicial complexes. Using the proposed persistent cohomology based framework, enriched barcodes are extracted from datasets to represent heterogeneous information. We consider a variety of datasets to validate the present formulation and illustrate the usefulness of the proposed method based on persistent cohomology. It is found that the proposed framework outperforms or at least matches the state-of-the-art methods in the protein-ligand binding affinity prediction from massive biomolecular datasets without resorting to any deep learning formulation.

持久同调是描述数据集在不同几何尺度上的拓扑结构的强大工具。当应用于分子结构的描述时,持续同源性可以捕捉分子结构的多尺度几何特征,揭示分子结构拓扑不变量的相互作用模式。然而,除了几何信息外,分子结构还有各种各样的非几何信息,如元素类型、原子部分电荷、原子对相互作用、静电势函数等,这些都不能用持久同源性来描述。虽然元素特定同调和静电持久同调可以将一些非几何信息编码为基于几何的拓扑不变量,但需要有一个数学范式来系统地将几何和非几何信息(即多组分异构信息)嵌入到统一的拓扑表示中。为此,我们提出了一个基于持久上同调的框架来丰富数据的表示。在我们的框架中,非几何信息既可以全局分布,也可以在几何意义上驻留在数据集上,并且可以在拓扑空间(即简单复合体)上适当地定义。利用所提出的基于持久上同源的框架,从数据集中提取丰富的条形码来表示异构信息。我们考虑了各种数据集来验证目前的公式,并说明了基于持久上同的提出的方法的实用性。研究发现,所提出的框架优于或至少与来自大量生物分子数据集的蛋白质-配体结合亲和力预测的最先进方法相匹配,而无需使用任何深度学习公式。
{"title":"Persistent Cohomology for Data With Multicomponent Heterogeneous Information.","authors":"Zixuan Cang,&nbsp;Guo-Wei Wei","doi":"10.1137/19m1272226","DOIUrl":"https://doi.org/10.1137/19m1272226","url":null,"abstract":"<p><p>Persistent homology is a powerful tool for characterizing the topology of a data set at various geometric scales. When applied to the description of molecular structures, persistent homology can capture the multiscale geometric features and reveal certain interaction patterns in terms of topological invariants. However, in addition to the geometric information, there is a wide variety of nongeometric information of molecular structures, such as element types, atomic partial charges, atomic pairwise interactions, and electrostatic potential functions, that is not described by persistent homology. Although element-specific homology and electrostatic persistent homology can encode some nongeometric information into geometry based topological invariants, it is desirable to have a mathematical paradigm to systematically embed both geometric and nongeometric information, i.e., multicomponent heterogeneous information, into unified topological representations. To this end, we propose a persistent cohomology based framework for the enriched representation of data. In our framework, nongeometric information can either be distributed globally or reside locally on the datasets in the geometric sense and can be properly defined on topological spaces, i.e., simplicial complexes. Using the proposed persistent cohomology based framework, enriched barcodes are extracted from datasets to represent heterogeneous information. We consider a variety of datasets to validate the present formulation and illustrate the usefulness of the proposed method based on persistent cohomology. It is found that the proposed framework outperforms or at least matches the state-of-the-art methods in the protein-ligand binding affinity prediction from massive biomolecular datasets without resorting to any deep learning formulation.</p>","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"2 2","pages":"396-418"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1137/19m1272226","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39150984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
期刊
SIAM journal on mathematics of data science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1