Givens rotations for QR decomposition, SVD and PCA over database joins

Dan Olteanu, Nils Vortmeier, Ɖorđe Živanović
{"title":"Givens rotations for QR decomposition, SVD and PCA over database joins","authors":"Dan Olteanu, Nils Vortmeier, Ɖorđe Živanović","doi":"10.1007/s00778-023-00818-9","DOIUrl":null,"url":null,"abstract":"<p>This article introduces <span>FiGaRo</span>, an algorithm for computing the upper-triangular matrix in the QR decomposition of the matrix defined by the natural join over relational data. <span>FiGaRo</span> ’s main novelty is that it pushes the QR decomposition past the join. This leads to several desirable properties. For acyclic joins, it takes time linear in the database size and independent of the join size. Its execution is equivalent to the application of a sequence of Givens rotations proportional to the join size. Its number of rounding errors relative to the classical QR decomposition algorithms is on par with the database size relative to the join output size. The QR decomposition lies at the core of many linear algebra computations including the singular value decomposition (SVD) and the principal component analysis (PCA). We show how <span>FiGaRo</span> can be used to compute the orthogonal matrix in the QR decomposition, the SVD and the PCA of the join output without the need to materialize the join output. A suite of experiments validate that <span>FiGaRo</span> can outperform both in runtime performance and numerical accuracy the LAPACK library Intel MKL by a factor proportional to the gap between the sizes of the join output and input.\n</p>","PeriodicalId":501532,"journal":{"name":"The VLDB Journal","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The VLDB Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00778-023-00818-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

This article introduces FiGaRo, an algorithm for computing the upper-triangular matrix in the QR decomposition of the matrix defined by the natural join over relational data. FiGaRo ’s main novelty is that it pushes the QR decomposition past the join. This leads to several desirable properties. For acyclic joins, it takes time linear in the database size and independent of the join size. Its execution is equivalent to the application of a sequence of Givens rotations proportional to the join size. Its number of rounding errors relative to the classical QR decomposition algorithms is on par with the database size relative to the join output size. The QR decomposition lies at the core of many linear algebra computations including the singular value decomposition (SVD) and the principal component analysis (PCA). We show how FiGaRo can be used to compute the orthogonal matrix in the QR decomposition, the SVD and the PCA of the join output without the need to materialize the join output. A suite of experiments validate that FiGaRo can outperform both in runtime performance and numerical accuracy the LAPACK library Intel MKL by a factor proportional to the gap between the sizes of the join output and input.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
给出了QR分解、SVD和PCA在数据库连接上的旋转
本文介绍了一种在关系数据上由自然连接定义的矩阵QR分解中计算上三角矩阵的算法FiGaRo。FiGaRo的主要新颖之处在于它将QR分解推过了连接。这导致了几个理想的特性。对于非循环连接,它所花费的时间与数据库大小成线性关系,与连接大小无关。它的执行相当于应用一系列与连接大小成比例的Givens旋转。它相对于经典QR分解算法的舍入误差数量与数据库大小相对于连接输出大小的数量相当。QR分解是许多线性代数计算的核心,包括奇异值分解(SVD)和主成分分析(PCA)。我们展示了如何使用FiGaRo来计算QR分解中的正交矩阵、SVD和连接输出的PCA,而不需要具体化连接输出。一组实验验证了FiGaRo在运行时性能和数值精度上都优于LAPACK库Intel MKL,其系数与连接输出和输入大小之间的差距成正比。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A versatile framework for attributed network clustering via K-nearest neighbor augmentation Discovering critical vertices for reinforcement of large-scale bipartite networks DumpyOS: A data-adaptive multi-ary index for scalable data series similarity search Enabling space-time efficient range queries with REncoder AutoCTS++: zero-shot joint neural architecture and hyperparameter search for correlated time series forecasting
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1