{"title":"Improved testing of low rank matrices","authors":"Yi Li, Zhengyu Wang, David P. Woodruff","doi":"10.1145/2623330.2623736","DOIUrl":null,"url":null,"abstract":"We study the problem of determining if an input matrix A εRm x n can be well-approximated by a low rank matrix. Specifically, we study the problem of quickly estimating the rank or stable rank of A, the latter often providing a more robust measure of the rank. Since we seek significantly sublinear time algorithms, we cast these problems in the property testing framework. In this framework, A either has low rank or stable rank, or is far from having this property. The algorithm should read only a small number of entries or rows of A and decide which case A is in with high probability. If neither case occurs, the output is allowed to be arbitrary. We consider two notions of being far: (1) A requires changing at least an ε-fraction of its entries, or (2) A requires changing at least an ε-fraction of its rows. We call the former the \"entry model\" and the latter the \"row model\". We show: For testing if a matrix has rank at most d in the entry model, we improve the previous number of entries of A that need to be read from O(d2/ε2) (Krauthgamer and Sasson, SODA 2003) to O(d2/ε). Our algorithm is the first to adaptively query the entries of A, which for constant d we show is necessary to achieve O(1/ε) queries. For the important case of d = 1 we also give a new non-adaptive algorithm, improving the previous O(1/ε2) queries to O(log2(1/ε) / ε). For testing if a matrix has rank at most d in the row model, we prove an Ω(d/ε) lower bound on the number of rows that need to be read, even for adaptive algorithms. Our lower bound matches a non-adaptive upper bound of Krauthgamer and Sasson. For testing if a matrix has stable rank at most d in the row model or requires changing an ε/d-fraction of its rows in order to have stable rank at most d, we prove that reading θ(d/ε2) rows is necessary and sufficient. We also give an empirical evaluation of our rank and stable rank algorithms on real and synthetic datasets.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"78 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2623330.2623736","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
We study the problem of determining if an input matrix A εRm x n can be well-approximated by a low rank matrix. Specifically, we study the problem of quickly estimating the rank or stable rank of A, the latter often providing a more robust measure of the rank. Since we seek significantly sublinear time algorithms, we cast these problems in the property testing framework. In this framework, A either has low rank or stable rank, or is far from having this property. The algorithm should read only a small number of entries or rows of A and decide which case A is in with high probability. If neither case occurs, the output is allowed to be arbitrary. We consider two notions of being far: (1) A requires changing at least an ε-fraction of its entries, or (2) A requires changing at least an ε-fraction of its rows. We call the former the "entry model" and the latter the "row model". We show: For testing if a matrix has rank at most d in the entry model, we improve the previous number of entries of A that need to be read from O(d2/ε2) (Krauthgamer and Sasson, SODA 2003) to O(d2/ε). Our algorithm is the first to adaptively query the entries of A, which for constant d we show is necessary to achieve O(1/ε) queries. For the important case of d = 1 we also give a new non-adaptive algorithm, improving the previous O(1/ε2) queries to O(log2(1/ε) / ε). For testing if a matrix has rank at most d in the row model, we prove an Ω(d/ε) lower bound on the number of rows that need to be read, even for adaptive algorithms. Our lower bound matches a non-adaptive upper bound of Krauthgamer and Sasson. For testing if a matrix has stable rank at most d in the row model or requires changing an ε/d-fraction of its rows in order to have stable rank at most d, we prove that reading θ(d/ε2) rows is necessary and sufficient. We also give an empirical evaluation of our rank and stable rank algorithms on real and synthetic datasets.
研究了输入矩阵A εRm x n能否被低秩矩阵很好地逼近的问题。具体来说,我们研究了快速估计A的秩或稳定秩的问题,后者通常提供一个更稳健的秩度量。由于我们寻求重要的次线性时间算法,我们将这些问题置于性质测试框架中。在这个框架中,A要么具有低秩,要么具有稳定秩,要么远不具有这种性质。算法应该只读取a的少量条目或行,并确定a处于高概率的哪种情况。如果这两种情况都不发生,则允许输出是任意的。我们考虑两个远的概念:(1)A需要改变它的至少一个ε-分数的条目,或者(2)A需要改变它的至少一个ε-分数的行。我们称前者为“入口模型”,后者为“行模型”。为了测试一个矩阵在条目模型中是否排名最多为d,我们将a的先前需要读取的条目数从O(d2/ε2) (Krauthgamer and Sasson, SODA 2003)提高到O(d2/ε)。我们的算法是第一个自适应查询A的条目的算法,对于常数d,我们证明了实现O(1/ε)查询是必要的。对于d = 1的重要情况,我们还给出了一种新的非自适应算法,将之前的O(1/ε2)查询改进为O(log2(1/ε) /ε)。为了测试一个矩阵在行模型中是否秩最多为d,我们证明了需要读取的行数的Ω(d/ε)下界,即使对于自适应算法也是如此。我们的下界与Krauthgamer和Sasson的非自适应上界相匹配。为了检验一个矩阵在行模型中是否具有最多d的稳定秩,或者是否需要改变其行的ε/d分数才能具有最多d的稳定秩,我们证明了读取θ(d/ε2)行是必要和充分的。我们还在真实数据集和合成数据集上对我们的秩和稳定秩算法进行了实证评估。