一种新的用于方阵缓存无关就地转置的三角形空间填充曲线

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2023-05-01 DOI:10.1109/IPDPS54959.2023.00045

J. N. F. Alves, L. Russo, Alexandre P. Francisco, S. Benkner

{"title":"一种新的用于方阵缓存无关就地转置的三角形空间填充曲线","authors":"J. N. F. Alves, L. Russo, Alexandre P. Francisco, S. Benkner","doi":"10.1109/IPDPS54959.2023.00045","DOIUrl":null,"url":null,"abstract":"This paper proposes a novel cache-oblivious blocking scheme based on a new triangular space-filling curve which preserves data locality. The proposed blocking-scheme reduces the movement of data within the host memory hierarchy for triangular matrix traversals, which inherently exhibit poor data locality, such as the in-place transposition of square matrices. We show that our cache-oblivious blocking-scheme can be generated iteratively in linear time and constant memory with regard to the number of entries present in the lower, or upper, triangle of the input matrix. In contrast to classical recursive cache-oblivious solutions, the iterative nature of our blocking-scheme does not inhibit other essential optimizations such as software prefetching. In order to assess the viability of our blocking-scheme as a cache-oblivious strategy, we applied it to the in-place transposition of square matrices. Extensive experiments show that our cache-oblivious transposition algorithm generally outperforms the cache-aware state-of-the-art algorithm in terms of throughput and energy efficiency in sequential as well as parallel environments.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Triangular Space-Filling Curve for Cache-Oblivious In-Place Transposition of Square Matrices\",\"authors\":\"J. N. F. Alves, L. Russo, Alexandre P. Francisco, S. Benkner\",\"doi\":\"10.1109/IPDPS54959.2023.00045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a novel cache-oblivious blocking scheme based on a new triangular space-filling curve which preserves data locality. The proposed blocking-scheme reduces the movement of data within the host memory hierarchy for triangular matrix traversals, which inherently exhibit poor data locality, such as the in-place transposition of square matrices. We show that our cache-oblivious blocking-scheme can be generated iteratively in linear time and constant memory with regard to the number of entries present in the lower, or upper, triangle of the input matrix. In contrast to classical recursive cache-oblivious solutions, the iterative nature of our blocking-scheme does not inhibit other essential optimizations such as software prefetching. In order to assess the viability of our blocking-scheme as a cache-oblivious strategy, we applied it to the in-place transposition of square matrices. Extensive experiments show that our cache-oblivious transposition algorithm generally outperforms the cache-aware state-of-the-art algorithm in terms of throughput and energy efficiency in sequential as well as parallel environments.\",\"PeriodicalId\":343684,\"journal\":{\"name\":\"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS54959.2023.00045\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS54959.2023.00045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种新的基于三角形空间填充曲线的缓存无关阻塞方案。所提出的阻塞方案减少了三角矩阵遍历在主机内存层次结构中的数据移动，三角矩阵遍历固有地表现出较差的数据局部性，例如方阵的就地转置。我们表明，我们的缓存无关阻塞方案可以在线性时间和恒定内存中迭代地生成，与输入矩阵的下三角形或上三角形中存在的条目数量有关。与经典的递归缓参无关解决方案相比，我们的阻塞方案的迭代特性不会抑制其他必要的优化，例如软件预取。为了评估我们的阻塞方案作为缓存无关策略的可行性，我们将其应用于方阵的就地转置。大量的实验表明，我们的缓存无关转置算法在串行和并行环境中的吞吐量和能源效率方面通常优于缓存感知的最先进算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Novel Triangular Space-Filling Curve for Cache-Oblivious In-Place Transposition of Square Matrices

This paper proposes a novel cache-oblivious blocking scheme based on a new triangular space-filling curve which preserves data locality. The proposed blocking-scheme reduces the movement of data within the host memory hierarchy for triangular matrix traversals, which inherently exhibit poor data locality, such as the in-place transposition of square matrices. We show that our cache-oblivious blocking-scheme can be generated iteratively in linear time and constant memory with regard to the number of entries present in the lower, or upper, triangle of the input matrix. In contrast to classical recursive cache-oblivious solutions, the iterative nature of our blocking-scheme does not inhibit other essential optimizations such as software prefetching. In order to assess the viability of our blocking-scheme as a cache-oblivious strategy, we applied it to the in-place transposition of square matrices. Extensive experiments show that our cache-oblivious transposition algorithm generally outperforms the cache-aware state-of-the-art algorithm in terms of throughput and energy efficiency in sequential as well as parallel environments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量