Tight query complexity lower bounds for PCA via finite sample deformed wigner law

Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing Pub Date : 2018-04-04 DOI:10.1145/3188745.3188796

Max Simchowitz, A. Alaoui, B. Recht

{"title":"Tight query complexity lower bounds for PCA via finite sample deformed wigner law","authors":"Max Simchowitz, A. Alaoui, B. Recht","doi":"10.1145/3188745.3188796","DOIUrl":null,"url":null,"abstract":"We prove a query complexity lower bound for approximating the top r dimensional eigenspace of a matrix. We consider an oracle model where, given a symmetric matrix M ∈ ℝd × d, an algorithm Alg is allowed to make T exact queries of the form w(i) = M v(i) for i in {1,...,T}, where v(i) is drawn from a distribution which depends arbitrarily on the past queries and measurements {v(j),w(i)}1 ≤ j ≤ i−1. We show that for every gap ∈ (0,1/2], there exists a distribution over matrices M for which 1) gapr(M) = Ω(gap) (where gapr(M) is the normalized gap between the r and r+1-st largest-magnitude eigenvector of M), and 2) any Alg which takes fewer than const × r logd/√gap queries fails (with overwhelming probability) to identity a matrix V ∈ ℝd × r with orthonormal columns for which ⟨ V, M V⟩ ≥ (1 − const × gap)∑i=1r λi(M). Our bound requires only that d is a small polynomial in 1/gap and r, and matches the upper bounds of Musco and Musco ’15. Moreover, it establishes a strict separation between convex optimization and “strict-saddle” non-convex optimization of which PCA is a canonical example: in the former, first-order methods can have dimension-free iteration complexity, whereas in PCA, the iteration complexity of gradient-based methods must necessarily grow with the dimension. Our argument proceeds via a reduction to estimating a rank-r spike in a deformed Wigner model M =W + λ U U⊤, where W is from the Gaussian Orthogonal Ensemble, U is uniform on the d × r-Stieffel manifold and λ > 1 governs the size of the perturbation. Surprisingly, this ubiquitous random matrix model witnesses the worst-case rate for eigenspace approximation, and the ‘accelerated’ gap−1/2 in the rate follows as a consequence of the correspendence between the asymptotic eigengap and the size of the perturbation λ, when λ is near the “phase transition” λ = 1. To verify that d need only be polynomial in gap−1 and r, we prove a finite sample convergence theorem for top eigenvalues of a deformed Wigner matrix, which may be of independent interest. We then lower bound the above estimation problem with a novel technique based on Fano-style data-processing inequalities with truncated likelihoods; the technique generalizes the Bayes-risk lower bound of Chen et al. ’16, and we believe it is particularly suited to lower bounds in adaptive settings like the one considered in this paper.","PeriodicalId":20593,"journal":{"name":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3188745.3188796","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 33

Abstract

We prove a query complexity lower bound for approximating the top r dimensional eigenspace of a matrix. We consider an oracle model where, given a symmetric matrix M ∈ ℝd × d, an algorithm Alg is allowed to make T exact queries of the form w(i) = M v(i) for i in {1,...,T}, where v(i) is drawn from a distribution which depends arbitrarily on the past queries and measurements {v(j),w(i)}1 ≤ j ≤ i−1. We show that for every gap ∈ (0,1/2], there exists a distribution over matrices M for which 1) gapr(M) = Ω(gap) (where gapr(M) is the normalized gap between the r and r+1-st largest-magnitude eigenvector of M), and 2) any Alg which takes fewer than const × r logd/√gap queries fails (with overwhelming probability) to identity a matrix V ∈ ℝd × r with orthonormal columns for which ⟨ V, M V⟩ ≥ (1 − const × gap)∑i=1r λi(M). Our bound requires only that d is a small polynomial in 1/gap and r, and matches the upper bounds of Musco and Musco ’15. Moreover, it establishes a strict separation between convex optimization and “strict-saddle” non-convex optimization of which PCA is a canonical example: in the former, first-order methods can have dimension-free iteration complexity, whereas in PCA, the iteration complexity of gradient-based methods must necessarily grow with the dimension. Our argument proceeds via a reduction to estimating a rank-r spike in a deformed Wigner model M =W + λ U U⊤, where W is from the Gaussian Orthogonal Ensemble, U is uniform on the d × r-Stieffel manifold and λ > 1 governs the size of the perturbation. Surprisingly, this ubiquitous random matrix model witnesses the worst-case rate for eigenspace approximation, and the ‘accelerated’ gap−1/2 in the rate follows as a consequence of the correspendence between the asymptotic eigengap and the size of the perturbation λ, when λ is near the “phase transition” λ = 1. To verify that d need only be polynomial in gap−1 and r, we prove a finite sample convergence theorem for top eigenvalues of a deformed Wigner matrix, which may be of independent interest. We then lower bound the above estimation problem with a novel technique based on Fano-style data-processing inequalities with truncated likelihoods; the technique generalizes the Bayes-risk lower bound of Chen et al. ’16, and we believe it is particularly suited to lower bounds in adaptive settings like the one considered in this paper.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于有限样本变形维格纳定律的主成分分析紧密查询复杂度下界

我们证明了近似矩阵的上r维特征空间的查询复杂度下界。我们考虑一个oracle模型，其中，给定一个对称矩阵M∈v x d，允许算法Alg对i在{1，…，T}，其中v(i)是从任意依赖于过去查询和测量的分布中得出的{v(j)，w(i)}1≤j≤i−1。我们表明，对于每个间隙∈(0,1/2)，存在矩阵M上的分布，其中1)gapr(M) = Ω(gap)(其中gapr(M)是r和r+1-st最大特征向量之间的归一化间隙)，并且2)任何小于const × r logd/√gap查询的Alg都无法(以压倒性的概率)识别矩阵V∈V x x r，其标准正交列为⟨V, M V⟩≥(1 - const × gap)∑i=1r λi(M)。我们的边界只要求d是1/gap和r中的一个小多项式，并且匹配Musco和Musco ' 15的上界。此外，它将凸优化与“严格鞍形”非凸优化严格区分开来，其中PCA是一个典型的例子:在前者中，一阶方法可以具有无维迭代复杂度，而在PCA中，基于梯度的方法的迭代复杂度必然随着维数的增加而增加。我们的论证通过简化到估计变形Wigner模型M =W + λ U U，其中W来自高斯正交系综，U在d × r-Stieffel流形上是均匀的，λ >1控制扰动的大小。令人惊讶的是，这个无处不在的随机矩阵模型见证了特征空间近似的最坏情况速率，并且当λ接近“相变”λ = 1时，由于渐近特征与扰动λ的大小之间的对应关系，速率中的“加速”间隙- 1/2。为了证明d只需要是gap - 1和r中的多项式，我们证明了变形Wigner矩阵的上特征值的有限样本收敛定理，这可能是一个独立的兴趣。然后，我们使用一种基于截断似然的fano式数据处理不等式的新技术对上述估计问题下界;该技术推广了Chen等人的贝叶斯风险下界[16]，我们认为它特别适合于本文所考虑的自适应设置中的下界。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助