{"title":"AVX-512矢量化的二阶奇异值分解的批量计算","authors":"Vedran Novakovi'c","doi":"10.1142/S0129626420500152","DOIUrl":null,"url":null,"abstract":"In this paper a vectorized algorithm for simultaneously computing up to eight singular value decompositions (SVDs, each of the form $A=U\\Sigma V^{\\ast}$) of real or complex matrices of order two is proposed. The algorithm extends to a batch of matrices of an arbitrary length $n$, that arises, for example, in the annihilation part of the parallel Kogbetliantz algorithm for the SVD of a square matrix of order $2n$. The SVD algorithm for a single matrix of order two is derived first. It scales, in most instances error-free, the input matrix $A$ such that its singular values $\\Sigma_{ii}$ cannot overflow whenever its elements are finite, and then computes the URV factorization of the scaled matrix, followed by the SVD of a non-negative upper-triangular middle factor. A vector-friendly data layout for the batch is then introduced, where the same-indexed elements of each of the input and the output matrices form vectors, and the algorithm's steps over such vectors are described. The vectorized approach is then shown to be about three times faster than processing each matrix in isolation, while slightly improving accuracy over the straightforward method for the $2\\times 2$ SVD.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Batched computation of the singular value decompositions of order two by the AVX-512 vectorization\",\"authors\":\"Vedran Novakovi'c\",\"doi\":\"10.1142/S0129626420500152\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper a vectorized algorithm for simultaneously computing up to eight singular value decompositions (SVDs, each of the form $A=U\\\\Sigma V^{\\\\ast}$) of real or complex matrices of order two is proposed. The algorithm extends to a batch of matrices of an arbitrary length $n$, that arises, for example, in the annihilation part of the parallel Kogbetliantz algorithm for the SVD of a square matrix of order $2n$. The SVD algorithm for a single matrix of order two is derived first. It scales, in most instances error-free, the input matrix $A$ such that its singular values $\\\\Sigma_{ii}$ cannot overflow whenever its elements are finite, and then computes the URV factorization of the scaled matrix, followed by the SVD of a non-negative upper-triangular middle factor. A vector-friendly data layout for the batch is then introduced, where the same-indexed elements of each of the input and the output matrices form vectors, and the algorithm's steps over such vectors are described. The vectorized approach is then shown to be about three times faster than processing each matrix in isolation, while slightly improving accuracy over the straightforward method for the $2\\\\times 2$ SVD.\",\"PeriodicalId\":422436,\"journal\":{\"name\":\"Parallel Process. Lett.\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Parallel Process. Lett.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/S0129626420500152\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Parallel Process. Lett.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S0129626420500152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
摘要
本文提出了一种同时计算2阶实数或复数矩阵最多8个奇异值分解的矢量化算法,每个奇异值分解的形式为$ a =U\Sigma V^{\ast}$。该算法扩展到任意长度$n$的一批矩阵,例如,出现在并行Kogbetliantz算法的湮灭部分,用于求阶为$2n$的方阵的SVD。首先推导了单二阶矩阵的奇异值分解算法。在大多数情况下,它对输入矩阵$A$进行缩放,使其奇异值$\Sigma_{ii}$在其元素有限时不会溢出,然后计算缩放后的矩阵的URV分解,然后计算非负上三角形中间因子的SVD。然后引入批处理的向量友好型数据布局,其中每个输入和输出矩阵的相同索引元素形成向量,并描述算法在这些向量上的步骤。然后,向量化方法被证明比单独处理每个矩阵快三倍左右,同时对$2\ × 2$ SVD的精度略高于直接方法。
Batched computation of the singular value decompositions of order two by the AVX-512 vectorization
In this paper a vectorized algorithm for simultaneously computing up to eight singular value decompositions (SVDs, each of the form $A=U\Sigma V^{\ast}$) of real or complex matrices of order two is proposed. The algorithm extends to a batch of matrices of an arbitrary length $n$, that arises, for example, in the annihilation part of the parallel Kogbetliantz algorithm for the SVD of a square matrix of order $2n$. The SVD algorithm for a single matrix of order two is derived first. It scales, in most instances error-free, the input matrix $A$ such that its singular values $\Sigma_{ii}$ cannot overflow whenever its elements are finite, and then computes the URV factorization of the scaled matrix, followed by the SVD of a non-negative upper-triangular middle factor. A vector-friendly data layout for the batch is then introduced, where the same-indexed elements of each of the input and the output matrices form vectors, and the algorithm's steps over such vectors are described. The vectorized approach is then shown to be about three times faster than processing each matrix in isolation, while slightly improving accuracy over the straightforward method for the $2\times 2$ SVD.