{"title":"Beyond symmetry: best submatrix selection for the sparse truncated SVD","authors":"","doi":"10.1007/s10107-023-02030-7","DOIUrl":null,"url":null,"abstract":"<h3>Abstract</h3> <p>The truncated singular value decomposition (SVD), also known as the best low-rank matrix approximation with minimum error measured by a unitarily invariant norm, has been applied to many domains such as biology, healthcare, among others, where high-dimensional datasets are prevalent. To extract interpretable information from the high-dimensional data, sparse truncated SVD (SSVD) has been used to select a handful of rows and columns of the original matrix along with the best low-rank approximation. Different from the literature on SSVD focusing on the top singular value or compromising the sparsity for the seek of computational efficiency, this paper presents a novel SSVD formulation that can select the best submatrix precisely up to a given size to maximize its truncated Ky Fan norm. The fact that the proposed SSVD problem is NP-hard motivates us to study effective algorithms with provable performance guarantees. To do so, we first reformulate SSVD as a mixed-integer semidefinite program, which can be solved exactly for small- or medium-sized instances within a branch-and-cut algorithm framework with closed-form cuts and is extremely useful for evaluating the solution quality of approximation algorithms. We next develop three selection algorithms based on different selection criteria and two searching algorithms, greedy and local search. We prove the approximation ratios for all the approximation algorithms and show that all the ratios are tight when the number of rows or columns of the selected submatrix is no larger than half of the data matrix, i.e., our derived approximation ratios are unimprovable. Our numerical study demonstrates the high solution quality and computational efficiency of the proposed algorithms. Finally, all our analysis can be extended to row-sparse PCA.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"13 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Programming","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s10107-023-02030-7","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
The truncated singular value decomposition (SVD), also known as the best low-rank matrix approximation with minimum error measured by a unitarily invariant norm, has been applied to many domains such as biology, healthcare, among others, where high-dimensional datasets are prevalent. To extract interpretable information from the high-dimensional data, sparse truncated SVD (SSVD) has been used to select a handful of rows and columns of the original matrix along with the best low-rank approximation. Different from the literature on SSVD focusing on the top singular value or compromising the sparsity for the seek of computational efficiency, this paper presents a novel SSVD formulation that can select the best submatrix precisely up to a given size to maximize its truncated Ky Fan norm. The fact that the proposed SSVD problem is NP-hard motivates us to study effective algorithms with provable performance guarantees. To do so, we first reformulate SSVD as a mixed-integer semidefinite program, which can be solved exactly for small- or medium-sized instances within a branch-and-cut algorithm framework with closed-form cuts and is extremely useful for evaluating the solution quality of approximation algorithms. We next develop three selection algorithms based on different selection criteria and two searching algorithms, greedy and local search. We prove the approximation ratios for all the approximation algorithms and show that all the ratios are tight when the number of rows or columns of the selected submatrix is no larger than half of the data matrix, i.e., our derived approximation ratios are unimprovable. Our numerical study demonstrates the high solution quality and computational efficiency of the proposed algorithms. Finally, all our analysis can be extended to row-sparse PCA.
期刊介绍:
Mathematical Programming publishes original articles dealing with every aspect of mathematical optimization; that is, everything of direct or indirect use concerning the problem of optimizing a function of many variables, often subject to a set of constraints. This involves theoretical and computational issues as well as application studies. Included, along with the standard topics of linear, nonlinear, integer, conic, stochastic and combinatorial optimization, are techniques for formulating and applying mathematical programming models, convex, nonsmooth and variational analysis, the theory of polyhedra, variational inequalities, and control and game theory viewed from the perspective of mathematical programming.