Beyond symmetry: best submatrix selection for the sparse truncated SVD

IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Mathematical Programming Pub Date : 2023-12-21 DOI:10.1007/s10107-023-02030-7
{"title":"Beyond symmetry: best submatrix selection for the sparse truncated SVD","authors":"","doi":"10.1007/s10107-023-02030-7","DOIUrl":null,"url":null,"abstract":"<h3>Abstract</h3> <p>The truncated singular value decomposition (SVD), also known as the best low-rank matrix approximation with minimum error measured by a unitarily invariant norm, has been applied to many domains such as biology, healthcare, among others, where high-dimensional datasets are prevalent. To extract interpretable information from the high-dimensional data, sparse truncated SVD (SSVD) has been used to select a handful of rows and columns of the original matrix along with the best low-rank approximation. Different from the literature on SSVD focusing on the top singular value or compromising the sparsity for the seek of computational efficiency, this paper presents a novel SSVD formulation that can select the best submatrix precisely up to a given size to maximize its truncated Ky Fan norm. The fact that the proposed SSVD problem is NP-hard motivates us to study effective algorithms with provable performance guarantees. To do so, we first reformulate SSVD as a mixed-integer semidefinite program, which can be solved exactly for small- or medium-sized instances within a branch-and-cut algorithm framework with closed-form cuts and is extremely useful for evaluating the solution quality of approximation algorithms. We next develop three selection algorithms based on different selection criteria and two searching algorithms, greedy and local search. We prove the approximation ratios for all the approximation algorithms and show that all the ratios are tight when the number of rows or columns of the selected submatrix is no larger than half of the data matrix, i.e., our derived approximation ratios are unimprovable. Our numerical study demonstrates the high solution quality and computational efficiency of the proposed algorithms. Finally, all our analysis can be extended to row-sparse PCA.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"13 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Programming","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s10107-023-02030-7","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

The truncated singular value decomposition (SVD), also known as the best low-rank matrix approximation with minimum error measured by a unitarily invariant norm, has been applied to many domains such as biology, healthcare, among others, where high-dimensional datasets are prevalent. To extract interpretable information from the high-dimensional data, sparse truncated SVD (SSVD) has been used to select a handful of rows and columns of the original matrix along with the best low-rank approximation. Different from the literature on SSVD focusing on the top singular value or compromising the sparsity for the seek of computational efficiency, this paper presents a novel SSVD formulation that can select the best submatrix precisely up to a given size to maximize its truncated Ky Fan norm. The fact that the proposed SSVD problem is NP-hard motivates us to study effective algorithms with provable performance guarantees. To do so, we first reformulate SSVD as a mixed-integer semidefinite program, which can be solved exactly for small- or medium-sized instances within a branch-and-cut algorithm framework with closed-form cuts and is extremely useful for evaluating the solution quality of approximation algorithms. We next develop three selection algorithms based on different selection criteria and two searching algorithms, greedy and local search. We prove the approximation ratios for all the approximation algorithms and show that all the ratios are tight when the number of rows or columns of the selected submatrix is no larger than half of the data matrix, i.e., our derived approximation ratios are unimprovable. Our numerical study demonstrates the high solution quality and computational efficiency of the proposed algorithms. Finally, all our analysis can be extended to row-sparse PCA.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
超越对称:稀疏截断 SVD 的最佳子矩阵选择
摘要 截断奇异值分解(SVD)也称为以单位不变规范衡量误差最小的最佳低秩矩阵近似值,已被应用于生物、医疗保健等许多领域,其中高维数据集十分普遍。为了从高维数据中提取可解释的信息,稀疏截断 SVD(SSVD)被用来选择原始矩阵的少量行和列以及最佳低阶近似值。与关注顶奇异值或为追求计算效率而牺牲稀疏性的 SSVD 文献不同,本文提出了一种新颖的 SSVD 公式,它能精确选择给定大小的最佳子矩阵,以最大化其截断的基范规范。所提出的 SSVD 问题是 NP 难问题,这促使我们研究具有可证明性能保证的有效算法。为此,我们首先将 SSVD 重新表述为一个混合整数半inite 程序,该程序可以在具有闭式切分的分支-切分算法框架内精确求解中小型实例,对于评估近似算法的求解质量非常有用。接下来,我们根据不同的选择标准开发了三种选择算法,并开发了贪婪搜索和局部搜索两种搜索算法。我们证明了所有近似算法的近似率,并表明当所选子矩阵的行数或列数不大于数据矩阵的一半时,所有近似率都是紧密的,也就是说,我们推导出的近似率是不可改进的。我们的数值研究证明了所提算法的高求解质量和计算效率。最后,我们的所有分析都可以扩展到行稀疏 PCA。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Mathematical Programming
Mathematical Programming 数学-计算机:软件工程
CiteScore
5.70
自引率
11.10%
发文量
160
审稿时长
4-8 weeks
期刊介绍: Mathematical Programming publishes original articles dealing with every aspect of mathematical optimization; that is, everything of direct or indirect use concerning the problem of optimizing a function of many variables, often subject to a set of constraints. This involves theoretical and computational issues as well as application studies. Included, along with the standard topics of linear, nonlinear, integer, conic, stochastic and combinatorial optimization, are techniques for formulating and applying mathematical programming models, convex, nonsmooth and variational analysis, the theory of polyhedra, variational inequalities, and control and game theory viewed from the perspective of mathematical programming.
期刊最新文献
Did smallpox cause stillbirths? Maternal smallpox infection, vaccination, and stillbirths in Sweden, 1780-1839. Fast convergence to non-isolated minima: four equivalent conditions for $${\textrm{C}^{2}}$$ functions Complexity of chordal conversion for sparse semidefinite programs with small treewidth Recycling valid inequalities for robust combinatorial optimization with budgeted uncertainty Accelerated stochastic approximation with state-dependent noise
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1