Beyond symmetry: best submatrix selection for the sparse truncated SVD

IF 2.2 2区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Mathematical Programming Pub Date : 2023-12-21 DOI:10.1007/s10107-023-02030-7

{"title":"Beyond symmetry: best submatrix selection for the sparse truncated SVD","authors":"","doi":"10.1007/s10107-023-02030-7","DOIUrl":null,"url":null,"abstract":"<h3>Abstract</h3> <p>The truncated singular value decomposition (SVD), also known as the best low-rank matrix approximation with minimum error measured by a unitarily invariant norm, has been applied to many domains such as biology, healthcare, among others, where high-dimensional datasets are prevalent. To extract interpretable information from the high-dimensional data, sparse truncated SVD (SSVD) has been used to select a handful of rows and columns of the original matrix along with the best low-rank approximation. Different from the literature on SSVD focusing on the top singular value or compromising the sparsity for the seek of computational efficiency, this paper presents a novel SSVD formulation that can select the best submatrix precisely up to a given size to maximize its truncated Ky Fan norm. The fact that the proposed SSVD problem is NP-hard motivates us to study effective algorithms with provable performance guarantees. To do so, we first reformulate SSVD as a mixed-integer semidefinite program, which can be solved exactly for small- or medium-sized instances within a branch-and-cut algorithm framework with closed-form cuts and is extremely useful for evaluating the solution quality of approximation algorithms. We next develop three selection algorithms based on different selection criteria and two searching algorithms, greedy and local search. We prove the approximation ratios for all the approximation algorithms and show that all the ratios are tight when the number of rows or columns of the selected submatrix is no larger than half of the data matrix, i.e., our derived approximation ratios are unimprovable. Our numerical study demonstrates the high solution quality and computational efficiency of the proposed algorithms. Finally, all our analysis can be extended to row-sparse PCA.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"13 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Programming","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s10107-023-02030-7","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

The truncated singular value decomposition (SVD), also known as the best low-rank matrix approximation with minimum error measured by a unitarily invariant norm, has been applied to many domains such as biology, healthcare, among others, where high-dimensional datasets are prevalent. To extract interpretable information from the high-dimensional data, sparse truncated SVD (SSVD) has been used to select a handful of rows and columns of the original matrix along with the best low-rank approximation. Different from the literature on SSVD focusing on the top singular value or compromising the sparsity for the seek of computational efficiency, this paper presents a novel SSVD formulation that can select the best submatrix precisely up to a given size to maximize its truncated Ky Fan norm. The fact that the proposed SSVD problem is NP-hard motivates us to study effective algorithms with provable performance guarantees. To do so, we first reformulate SSVD as a mixed-integer semidefinite program, which can be solved exactly for small- or medium-sized instances within a branch-and-cut algorithm framework with closed-form cuts and is extremely useful for evaluating the solution quality of approximation algorithms. We next develop three selection algorithms based on different selection criteria and two searching algorithms, greedy and local search. We prove the approximation ratios for all the approximation algorithms and show that all the ratios are tight when the number of rows or columns of the selected submatrix is no larger than half of the data matrix, i.e., our derived approximation ratios are unimprovable. Our numerical study demonstrates the high solution quality and computational efficiency of the proposed algorithms. Finally, all our analysis can be extended to row-sparse PCA.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

超越对称：稀疏截断 SVD 的最佳子矩阵选择

摘要截断奇异值分解（SVD）也称为以单位不变规范衡量误差最小的最佳低秩矩阵近似值，已被应用于生物、医疗保健等许多领域，其中高维数据集十分普遍。为了从高维数据中提取可解释的信息，稀疏截断 SVD（SSVD）被用来选择原始矩阵的少量行和列以及最佳低阶近似值。与关注顶奇异值或为追求计算效率而牺牲稀疏性的 SSVD 文献不同，本文提出了一种新颖的 SSVD 公式，它能精确选择给定大小的最佳子矩阵，以最大化其截断的基范规范。所提出的 SSVD 问题是 NP 难问题，这促使我们研究具有可证明性能保证的有效算法。为此，我们首先将 SSVD 重新表述为一个混合整数半inite 程序，该程序可以在具有闭式切分的分支-切分算法框架内精确求解中小型实例，对于评估近似算法的求解质量非常有用。接下来，我们根据不同的选择标准开发了三种选择算法，并开发了贪婪搜索和局部搜索两种搜索算法。我们证明了所有近似算法的近似率，并表明当所选子矩阵的行数或列数不大于数据矩阵的一半时，所有近似率都是紧密的，也就是说，我们推导出的近似率是不可改进的。我们的数值研究证明了所提算法的高求解质量和计算效率。最后，我们的所有分析都可以扩展到行稀疏 PCA。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Mathematical Programming 数学-计算机：软件工程

CiteScore

5.70

自引率

11.10%

发文量

160

审稿时长

4-8 weeks

期刊介绍： Mathematical Programming publishes original articles dealing with every aspect of mathematical optimization; that is, everything of direct or indirect use concerning the problem of optimizing a function of many variables, often subject to a set of constraints. This involves theoretical and computational issues as well as application studies. Included, along with the standard topics of linear, nonlinear, integer, conic, stochastic and combinatorial optimization, are techniques for formulating and applying mathematical programming models, convex, nonsmooth and variational analysis, the theory of polyhedra, variational inequalities, and control and game theory viewed from the perspective of mathematical programming.