Batched computation of the singular value decompositions of order two by the AVX-512 vectorization

Vedran Novakovi'c
{"title":"Batched computation of the singular value decompositions of order two by the AVX-512 vectorization","authors":"Vedran Novakovi'c","doi":"10.1142/S0129626420500152","DOIUrl":null,"url":null,"abstract":"In this paper a vectorized algorithm for simultaneously computing up to eight singular value decompositions (SVDs, each of the form $A=U\\Sigma V^{\\ast}$) of real or complex matrices of order two is proposed. The algorithm extends to a batch of matrices of an arbitrary length $n$, that arises, for example, in the annihilation part of the parallel Kogbetliantz algorithm for the SVD of a square matrix of order $2n$. The SVD algorithm for a single matrix of order two is derived first. It scales, in most instances error-free, the input matrix $A$ such that its singular values $\\Sigma_{ii}$ cannot overflow whenever its elements are finite, and then computes the URV factorization of the scaled matrix, followed by the SVD of a non-negative upper-triangular middle factor. A vector-friendly data layout for the batch is then introduced, where the same-indexed elements of each of the input and the output matrices form vectors, and the algorithm's steps over such vectors are described. The vectorized approach is then shown to be about three times faster than processing each matrix in isolation, while slightly improving accuracy over the straightforward method for the $2\\times 2$ SVD.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Parallel Process. Lett.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S0129626420500152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

In this paper a vectorized algorithm for simultaneously computing up to eight singular value decompositions (SVDs, each of the form $A=U\Sigma V^{\ast}$) of real or complex matrices of order two is proposed. The algorithm extends to a batch of matrices of an arbitrary length $n$, that arises, for example, in the annihilation part of the parallel Kogbetliantz algorithm for the SVD of a square matrix of order $2n$. The SVD algorithm for a single matrix of order two is derived first. It scales, in most instances error-free, the input matrix $A$ such that its singular values $\Sigma_{ii}$ cannot overflow whenever its elements are finite, and then computes the URV factorization of the scaled matrix, followed by the SVD of a non-negative upper-triangular middle factor. A vector-friendly data layout for the batch is then introduced, where the same-indexed elements of each of the input and the output matrices form vectors, and the algorithm's steps over such vectors are described. The vectorized approach is then shown to be about three times faster than processing each matrix in isolation, while slightly improving accuracy over the straightforward method for the $2\times 2$ SVD.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
AVX-512矢量化的二阶奇异值分解的批量计算
本文提出了一种同时计算2阶实数或复数矩阵最多8个奇异值分解的矢量化算法,每个奇异值分解的形式为$ a =U\Sigma V^{\ast}$。该算法扩展到任意长度$n$的一批矩阵,例如,出现在并行Kogbetliantz算法的湮灭部分,用于求阶为$2n$的方阵的SVD。首先推导了单二阶矩阵的奇异值分解算法。在大多数情况下,它对输入矩阵$A$进行缩放,使其奇异值$\Sigma_{ii}$在其元素有限时不会溢出,然后计算缩放后的矩阵的URV分解,然后计算非负上三角形中间因子的SVD。然后引入批处理的向量友好型数据布局,其中每个输入和输出矩阵的相同索引元素形成向量,并描述算法在这些向量上的步骤。然后,向量化方法被证明比单独处理每个矩阵快三倍左右,同时对$2\ × 2$ SVD的精度略高于直接方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Note to Non-adaptive Broadcasting Semi-Supervised Node Classification via Semi-Global Graph Transformer Based on Homogeneity Augmentation 4-Free Strong Digraphs with the Maximum Size Relation-aware Graph Contrastive Learning The Normalized Laplacian Spectrum of Folded Hypercube with Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1