Recursive MaxSquare: Cache-friendly, Parallel, Scalable in situ Rectangular Matrix Transposition

2020 International Conference on Computational Science and Computational Intelligence (CSCI) Pub Date : 2020-12-01 DOI:10.1109/CSCI51800.2020.00228

Claudio A. Parra, Travis Yu, Kyu Seon Yum, Arturo Garza, I. Scherson

{"title":"Recursive MaxSquare: Cache-friendly, Parallel, Scalable in situ Rectangular Matrix Transposition","authors":"Claudio A. Parra, Travis Yu, Kyu Seon Yum, Arturo Garza, I. Scherson","doi":"10.1109/CSCI51800.2020.00228","DOIUrl":null,"url":null,"abstract":"An in situ rectangular matrix transposition algorithm is presented based on recursively partitioning an original rectangular matrix into a maximum size square matrix and a remaining rectangular sub-matrix. To transpose the maximum size square sub-matrix, a novel cache-friendly, parallel (multithreaded) and scalable in-place square matrix transposition procedure is proposed: it requires a total of Θ(n2/2) simple memory swaps, a single element temporary storage per thread, and does not make use of complex index arithmetic in the main transposition loop. Recursion is used to transpose the remaining rectangular sub-matrix. Dubbed Recursive MaxSquare, the novel proposed rectangular matrix in-place transposition algorithm uses a generalization of the perfect shuffle/unshuffle data permutation to stitch together the recursively transposed square matrices. The shuffle/unshuffle permutations are shown to be efficiently decomposed using basic vector/segment swaps, exchanges and/or cyclic shifts (rotations). A balanced parallel cycles-based transposition is also proposed for comparison.","PeriodicalId":336929,"journal":{"name":"2020 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"18 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Computational Science and Computational Intelligence (CSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCI51800.2020.00228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

An in situ rectangular matrix transposition algorithm is presented based on recursively partitioning an original rectangular matrix into a maximum size square matrix and a remaining rectangular sub-matrix. To transpose the maximum size square sub-matrix, a novel cache-friendly, parallel (multithreaded) and scalable in-place square matrix transposition procedure is proposed: it requires a total of Θ(n2/2) simple memory swaps, a single element temporary storage per thread, and does not make use of complex index arithmetic in the main transposition loop. Recursion is used to transpose the remaining rectangular sub-matrix. Dubbed Recursive MaxSquare, the novel proposed rectangular matrix in-place transposition algorithm uses a generalization of the perfect shuffle/unshuffle data permutation to stitch together the recursively transposed square matrices. The shuffle/unshuffle permutations are shown to be efficiently decomposed using basic vector/segment swaps, exchanges and/or cyclic shifts (rotations). A balanced parallel cycles-based transposition is also proposed for comparison.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

递归MaxSquare:缓存友好，并行，可伸缩的原位矩形矩阵转置

提出了一种基于将原始矩形矩阵递归划分为最大尺寸方阵和剩余矩形子矩阵的原位矩形矩阵转置算法。为了置换最大大小的方阵子矩阵，提出了一种新的缓存友好，并行(多线程)和可扩展的就地方阵置换过程:它需要总共Θ(n2/2)个简单的内存交换，每个线程一个元素临时存储，并且在主置换循环中不使用复杂的索引算法。递归用于对剩余的矩形子矩阵进行转置。被称为递归MaxSquare的新提出的矩形矩阵就地转置算法使用完美的洗牌/非洗牌数据排列的泛化来将递归转置的方阵拼接在一起。shuffle/unshuffle排列显示可以使用基本向量/段交换、交换和/或循环移位(旋转)有效地分解。为了进行比较，还提出了一种基于平衡并联周期的转置方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 International Conference on Computational Science and Computational Intelligence (CSCI)

自引率

0.00%

发文量