Preconditioned Gradient Descent for Overparameterized Nonconvex Burer-Monteiro Factorization with Global Optimality Certification

J. Mach. Learn. Res. Pub Date : 2022-06-07 DOI:10.48550/arXiv.2206.03345

G. Zhang, S. Fattahi, Richard Y. Zhang

{"title":"Preconditioned Gradient Descent for Overparameterized Nonconvex Burer-Monteiro Factorization with Global Optimality Certification","authors":"G. Zhang, S. Fattahi, Richard Y. Zhang","doi":"10.48550/arXiv.2206.03345","DOIUrl":null,"url":null,"abstract":"We consider using gradient descent to minimize the nonconvex function $f(X)=\\phi(XX^{T})$ over an $n\\times r$ factor matrix $X$, in which $\\phi$ is an underlying smooth convex cost function defined over $n\\times n$ matrices. While only a second-order stationary point $X$ can be provably found in reasonable time, if $X$ is additionally rank deficient, then its rank deficiency certifies it as being globally optimal. This way of certifying global optimality necessarily requires the search rank $r$ of the current iterate $X$ to be overparameterized with respect to the rank $r^{\\star}$ of the global minimizer $X^{\\star}$. Unfortunately, overparameterization significantly slows down the convergence of gradient descent, from a linear rate with $r=r^{\\star}$ to a sublinear rate when $r>r^{\\star}$, even when $\\phi$ is strongly convex. In this paper, we propose an inexpensive preconditioner that restores the convergence rate of gradient descent back to linear in the overparameterized case, while also making it agnostic to possible ill-conditioning in the global minimizer $X^{\\star}$.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"23 1","pages":"163:1-163:55"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Mach. Learn. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2206.03345","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

We consider using gradient descent to minimize the nonconvex function $f(X)=\phi(XX^{T})$ over an $n\times r$ factor matrix $X$, in which $\phi$ is an underlying smooth convex cost function defined over $n\times n$ matrices. While only a second-order stationary point $X$ can be provably found in reasonable time, if $X$ is additionally rank deficient, then its rank deficiency certifies it as being globally optimal. This way of certifying global optimality necessarily requires the search rank $r$ of the current iterate $X$ to be overparameterized with respect to the rank $r^{\star}$ of the global minimizer $X^{\star}$. Unfortunately, overparameterization significantly slows down the convergence of gradient descent, from a linear rate with $r=r^{\star}$ to a sublinear rate when $r>r^{\star}$, even when $\phi$ is strongly convex. In this paper, we propose an inexpensive preconditioner that restores the convergence rate of gradient descent back to linear in the overparameterized case, while also making it agnostic to possible ill-conditioning in the global minimizer $X^{\star}$.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

具有全局最优性证明的超参数化非凸Burer-Monteiro分解的预条件梯度下降

我们考虑使用梯度下降最小化非凸函数$f(X)=\phi(XX^{T})$除以一个$n\乘以r$因子矩阵$X$，其中$\phi$是一个定义在$n\乘以n$矩阵上的平滑凸代价函数。虽然在合理的时间内只能证明找到二阶平稳点$X$，但如果$X$又是秩不足的，则其秩不足证明它是全局最优的。这种证明全局最优性的方法必然要求当前迭代X$的搜索秩$r$相对于全局最小化器X^{\星}$的秩$r^{\星}$过度参数化。不幸的是，过度参数化显著地减慢了梯度下降的收敛速度，从$r=r^{\star}$的线性速率到$r>r^{\star}$的次线性速率，即使$\phi$是强凸的。在本文中，我们提出了一种廉价的预条件，使梯度下降的收敛速度在过参数化情况下恢复到线性，同时使其对全局最小器X^{\star}$可能的病态不可知。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

J. Mach. Learn. Res.

自引率

0.00%

发文量

期刊最新文献

Scalable Computation of Causal Bounds A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning Adaptive False Discovery Rate Control with Privacy Guarantee Fairlearn: Assessing and Improving Fairness of AI Systems Generalization Bounds for Adversarial Contrastive Learning