基于有限样本变形维格纳定律的主成分分析紧密查询复杂度下界

Max Simchowitz, A. Alaoui, B. Recht
{"title":"基于有限样本变形维格纳定律的主成分分析紧密查询复杂度下界","authors":"Max Simchowitz, A. Alaoui, B. Recht","doi":"10.1145/3188745.3188796","DOIUrl":null,"url":null,"abstract":"We prove a query complexity lower bound for approximating the top r dimensional eigenspace of a matrix. We consider an oracle model where, given a symmetric matrix M ∈ ℝd × d, an algorithm Alg is allowed to make T exact queries of the form w(i) = M v(i) for i in {1,...,T}, where v(i) is drawn from a distribution which depends arbitrarily on the past queries and measurements {v(j),w(i)}1 ≤ j ≤ i−1. We show that for every gap ∈ (0,1/2], there exists a distribution over matrices M for which 1) gapr(M) = Ω(gap) (where gapr(M) is the normalized gap between the r and r+1-st largest-magnitude eigenvector of M), and 2) any Alg which takes fewer than const × r logd/√gap queries fails (with overwhelming probability) to identity a matrix V ∈ ℝd × r with orthonormal columns for which ⟨ V, M V⟩ ≥ (1 − const × gap)∑i=1r λi(M). Our bound requires only that d is a small polynomial in 1/gap and r, and matches the upper bounds of Musco and Musco ’15. Moreover, it establishes a strict separation between convex optimization and “strict-saddle” non-convex optimization of which PCA is a canonical example: in the former, first-order methods can have dimension-free iteration complexity, whereas in PCA, the iteration complexity of gradient-based methods must necessarily grow with the dimension. Our argument proceeds via a reduction to estimating a rank-r spike in a deformed Wigner model M =W + λ U U⊤, where W is from the Gaussian Orthogonal Ensemble, U is uniform on the d × r-Stieffel manifold and λ > 1 governs the size of the perturbation. Surprisingly, this ubiquitous random matrix model witnesses the worst-case rate for eigenspace approximation, and the ‘accelerated’ gap−1/2 in the rate follows as a consequence of the correspendence between the asymptotic eigengap and the size of the perturbation λ, when λ is near the “phase transition” λ = 1. To verify that d need only be polynomial in gap−1 and r, we prove a finite sample convergence theorem for top eigenvalues of a deformed Wigner matrix, which may be of independent interest. We then lower bound the above estimation problem with a novel technique based on Fano-style data-processing inequalities with truncated likelihoods; the technique generalizes the Bayes-risk lower bound of Chen et al. ’16, and we believe it is particularly suited to lower bounds in adaptive settings like the one considered in this paper.","PeriodicalId":20593,"journal":{"name":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":"{\"title\":\"Tight query complexity lower bounds for PCA via finite sample deformed wigner law\",\"authors\":\"Max Simchowitz, A. Alaoui, B. Recht\",\"doi\":\"10.1145/3188745.3188796\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We prove a query complexity lower bound for approximating the top r dimensional eigenspace of a matrix. We consider an oracle model where, given a symmetric matrix M ∈ ℝd × d, an algorithm Alg is allowed to make T exact queries of the form w(i) = M v(i) for i in {1,...,T}, where v(i) is drawn from a distribution which depends arbitrarily on the past queries and measurements {v(j),w(i)}1 ≤ j ≤ i−1. We show that for every gap ∈ (0,1/2], there exists a distribution over matrices M for which 1) gapr(M) = Ω(gap) (where gapr(M) is the normalized gap between the r and r+1-st largest-magnitude eigenvector of M), and 2) any Alg which takes fewer than const × r logd/√gap queries fails (with overwhelming probability) to identity a matrix V ∈ ℝd × r with orthonormal columns for which ⟨ V, M V⟩ ≥ (1 − const × gap)∑i=1r λi(M). Our bound requires only that d is a small polynomial in 1/gap and r, and matches the upper bounds of Musco and Musco ’15. Moreover, it establishes a strict separation between convex optimization and “strict-saddle” non-convex optimization of which PCA is a canonical example: in the former, first-order methods can have dimension-free iteration complexity, whereas in PCA, the iteration complexity of gradient-based methods must necessarily grow with the dimension. Our argument proceeds via a reduction to estimating a rank-r spike in a deformed Wigner model M =W + λ U U⊤, where W is from the Gaussian Orthogonal Ensemble, U is uniform on the d × r-Stieffel manifold and λ > 1 governs the size of the perturbation. Surprisingly, this ubiquitous random matrix model witnesses the worst-case rate for eigenspace approximation, and the ‘accelerated’ gap−1/2 in the rate follows as a consequence of the correspendence between the asymptotic eigengap and the size of the perturbation λ, when λ is near the “phase transition” λ = 1. To verify that d need only be polynomial in gap−1 and r, we prove a finite sample convergence theorem for top eigenvalues of a deformed Wigner matrix, which may be of independent interest. We then lower bound the above estimation problem with a novel technique based on Fano-style data-processing inequalities with truncated likelihoods; the technique generalizes the Bayes-risk lower bound of Chen et al. ’16, and we believe it is particularly suited to lower bounds in adaptive settings like the one considered in this paper.\",\"PeriodicalId\":20593,\"journal\":{\"name\":\"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"33\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3188745.3188796\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3188745.3188796","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 33

摘要

我们证明了近似矩阵的上r维特征空间的查询复杂度下界。我们考虑一个oracle模型,其中,给定一个对称矩阵M∈v x d,允许算法Alg对i在{1,…,T},其中v(i)是从任意依赖于过去查询和测量的分布中得出的{v(j),w(i)}1≤j≤i−1。我们表明,对于每个间隙∈(0,1/2),存在矩阵M上的分布,其中1)gapr(M) = Ω(gap)(其中gapr(M)是r和r+1-st最大特征向量之间的归一化间隙),并且2)任何小于const × r logd/√gap查询的Alg都无法(以压倒性的概率)识别矩阵V∈V x x r,其标准正交列为⟨V, M V⟩≥(1 - const × gap)∑i=1r λi(M)。我们的边界只要求d是1/gap和r中的一个小多项式,并且匹配Musco和Musco ' 15的上界。此外,它将凸优化与“严格鞍形”非凸优化严格区分开来,其中PCA是一个典型的例子:在前者中,一阶方法可以具有无维迭代复杂度,而在PCA中,基于梯度的方法的迭代复杂度必然随着维数的增加而增加。我们的论证通过简化到估计变形Wigner模型M =W + λ U U,其中W来自高斯正交系综,U在d × r-Stieffel流形上是均匀的,λ >1控制扰动的大小。令人惊讶的是,这个无处不在的随机矩阵模型见证了特征空间近似的最坏情况速率,并且当λ接近“相变”λ = 1时,由于渐近特征与扰动λ的大小之间的对应关系,速率中的“加速”间隙- 1/2。为了证明d只需要是gap - 1和r中的多项式,我们证明了变形Wigner矩阵的上特征值的有限样本收敛定理,这可能是一个独立的兴趣。然后,我们使用一种基于截断似然的fano式数据处理不等式的新技术对上述估计问题下界;该技术推广了Chen等人的贝叶斯风险下界[16],我们认为它特别适合于本文所考虑的自适应设置中的下界。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Tight query complexity lower bounds for PCA via finite sample deformed wigner law
We prove a query complexity lower bound for approximating the top r dimensional eigenspace of a matrix. We consider an oracle model where, given a symmetric matrix M ∈ ℝd × d, an algorithm Alg is allowed to make T exact queries of the form w(i) = M v(i) for i in {1,...,T}, where v(i) is drawn from a distribution which depends arbitrarily on the past queries and measurements {v(j),w(i)}1 ≤ j ≤ i−1. We show that for every gap ∈ (0,1/2], there exists a distribution over matrices M for which 1) gapr(M) = Ω(gap) (where gapr(M) is the normalized gap between the r and r+1-st largest-magnitude eigenvector of M), and 2) any Alg which takes fewer than const × r logd/√gap queries fails (with overwhelming probability) to identity a matrix V ∈ ℝd × r with orthonormal columns for which ⟨ V, M V⟩ ≥ (1 − const × gap)∑i=1r λi(M). Our bound requires only that d is a small polynomial in 1/gap and r, and matches the upper bounds of Musco and Musco ’15. Moreover, it establishes a strict separation between convex optimization and “strict-saddle” non-convex optimization of which PCA is a canonical example: in the former, first-order methods can have dimension-free iteration complexity, whereas in PCA, the iteration complexity of gradient-based methods must necessarily grow with the dimension. Our argument proceeds via a reduction to estimating a rank-r spike in a deformed Wigner model M =W + λ U U⊤, where W is from the Gaussian Orthogonal Ensemble, U is uniform on the d × r-Stieffel manifold and λ > 1 governs the size of the perturbation. Surprisingly, this ubiquitous random matrix model witnesses the worst-case rate for eigenspace approximation, and the ‘accelerated’ gap−1/2 in the rate follows as a consequence of the correspendence between the asymptotic eigengap and the size of the perturbation λ, when λ is near the “phase transition” λ = 1. To verify that d need only be polynomial in gap−1 and r, we prove a finite sample convergence theorem for top eigenvalues of a deformed Wigner matrix, which may be of independent interest. We then lower bound the above estimation problem with a novel technique based on Fano-style data-processing inequalities with truncated likelihoods; the technique generalizes the Bayes-risk lower bound of Chen et al. ’16, and we believe it is particularly suited to lower bounds in adaptive settings like the one considered in this paper.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Data-dependent hashing via nonlinear spectral gaps Interactive compression to external information The query complexity of graph isomorphism: bypassing distribution testing lower bounds Collusion resistant traitor tracing from learning with errors Explicit binary tree codes with polylogarithmic size alphabet
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1