{"title":"Tight query complexity lower bounds for PCA via finite sample deformed wigner law","authors":"Max Simchowitz, A. Alaoui, B. Recht","doi":"10.1145/3188745.3188796","DOIUrl":null,"url":null,"abstract":"We prove a query complexity lower bound for approximating the top r dimensional eigenspace of a matrix. We consider an oracle model where, given a symmetric matrix M ∈ ℝd × d, an algorithm Alg is allowed to make T exact queries of the form w(i) = M v(i) for i in {1,...,T}, where v(i) is drawn from a distribution which depends arbitrarily on the past queries and measurements {v(j),w(i)}1 ≤ j ≤ i−1. We show that for every gap ∈ (0,1/2], there exists a distribution over matrices M for which 1) gapr(M) = Ω(gap) (where gapr(M) is the normalized gap between the r and r+1-st largest-magnitude eigenvector of M), and 2) any Alg which takes fewer than const × r logd/√gap queries fails (with overwhelming probability) to identity a matrix V ∈ ℝd × r with orthonormal columns for which ⟨ V, M V⟩ ≥ (1 − const × gap)∑i=1r λi(M). Our bound requires only that d is a small polynomial in 1/gap and r, and matches the upper bounds of Musco and Musco ’15. Moreover, it establishes a strict separation between convex optimization and “strict-saddle” non-convex optimization of which PCA is a canonical example: in the former, first-order methods can have dimension-free iteration complexity, whereas in PCA, the iteration complexity of gradient-based methods must necessarily grow with the dimension. Our argument proceeds via a reduction to estimating a rank-r spike in a deformed Wigner model M =W + λ U U⊤, where W is from the Gaussian Orthogonal Ensemble, U is uniform on the d × r-Stieffel manifold and λ > 1 governs the size of the perturbation. Surprisingly, this ubiquitous random matrix model witnesses the worst-case rate for eigenspace approximation, and the ‘accelerated’ gap−1/2 in the rate follows as a consequence of the correspendence between the asymptotic eigengap and the size of the perturbation λ, when λ is near the “phase transition” λ = 1. To verify that d need only be polynomial in gap−1 and r, we prove a finite sample convergence theorem for top eigenvalues of a deformed Wigner matrix, which may be of independent interest. We then lower bound the above estimation problem with a novel technique based on Fano-style data-processing inequalities with truncated likelihoods; the technique generalizes the Bayes-risk lower bound of Chen et al. ’16, and we believe it is particularly suited to lower bounds in adaptive settings like the one considered in this paper.","PeriodicalId":20593,"journal":{"name":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3188745.3188796","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 33
Abstract
We prove a query complexity lower bound for approximating the top r dimensional eigenspace of a matrix. We consider an oracle model where, given a symmetric matrix M ∈ ℝd × d, an algorithm Alg is allowed to make T exact queries of the form w(i) = M v(i) for i in {1,...,T}, where v(i) is drawn from a distribution which depends arbitrarily on the past queries and measurements {v(j),w(i)}1 ≤ j ≤ i−1. We show that for every gap ∈ (0,1/2], there exists a distribution over matrices M for which 1) gapr(M) = Ω(gap) (where gapr(M) is the normalized gap between the r and r+1-st largest-magnitude eigenvector of M), and 2) any Alg which takes fewer than const × r logd/√gap queries fails (with overwhelming probability) to identity a matrix V ∈ ℝd × r with orthonormal columns for which ⟨ V, M V⟩ ≥ (1 − const × gap)∑i=1r λi(M). Our bound requires only that d is a small polynomial in 1/gap and r, and matches the upper bounds of Musco and Musco ’15. Moreover, it establishes a strict separation between convex optimization and “strict-saddle” non-convex optimization of which PCA is a canonical example: in the former, first-order methods can have dimension-free iteration complexity, whereas in PCA, the iteration complexity of gradient-based methods must necessarily grow with the dimension. Our argument proceeds via a reduction to estimating a rank-r spike in a deformed Wigner model M =W + λ U U⊤, where W is from the Gaussian Orthogonal Ensemble, U is uniform on the d × r-Stieffel manifold and λ > 1 governs the size of the perturbation. Surprisingly, this ubiquitous random matrix model witnesses the worst-case rate for eigenspace approximation, and the ‘accelerated’ gap−1/2 in the rate follows as a consequence of the correspendence between the asymptotic eigengap and the size of the perturbation λ, when λ is near the “phase transition” λ = 1. To verify that d need only be polynomial in gap−1 and r, we prove a finite sample convergence theorem for top eigenvalues of a deformed Wigner matrix, which may be of independent interest. We then lower bound the above estimation problem with a novel technique based on Fano-style data-processing inequalities with truncated likelihoods; the technique generalizes the Bayes-risk lower bound of Chen et al. ’16, and we believe it is particularly suited to lower bounds in adaptive settings like the one considered in this paper.