A Novel Triangular Space-Filling Curve for Cache-Oblivious In-Place Transposition of Square Matrices

J. N. F. Alves, L. Russo, Alexandre P. Francisco, S. Benkner
{"title":"A Novel Triangular Space-Filling Curve for Cache-Oblivious In-Place Transposition of Square Matrices","authors":"J. N. F. Alves, L. Russo, Alexandre P. Francisco, S. Benkner","doi":"10.1109/IPDPS54959.2023.00045","DOIUrl":null,"url":null,"abstract":"This paper proposes a novel cache-oblivious blocking scheme based on a new triangular space-filling curve which preserves data locality. The proposed blocking-scheme reduces the movement of data within the host memory hierarchy for triangular matrix traversals, which inherently exhibit poor data locality, such as the in-place transposition of square matrices. We show that our cache-oblivious blocking-scheme can be generated iteratively in linear time and constant memory with regard to the number of entries present in the lower, or upper, triangle of the input matrix. In contrast to classical recursive cache-oblivious solutions, the iterative nature of our blocking-scheme does not inhibit other essential optimizations such as software prefetching. In order to assess the viability of our blocking-scheme as a cache-oblivious strategy, we applied it to the in-place transposition of square matrices. Extensive experiments show that our cache-oblivious transposition algorithm generally outperforms the cache-aware state-of-the-art algorithm in terms of throughput and energy efficiency in sequential as well as parallel environments.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS54959.2023.00045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper proposes a novel cache-oblivious blocking scheme based on a new triangular space-filling curve which preserves data locality. The proposed blocking-scheme reduces the movement of data within the host memory hierarchy for triangular matrix traversals, which inherently exhibit poor data locality, such as the in-place transposition of square matrices. We show that our cache-oblivious blocking-scheme can be generated iteratively in linear time and constant memory with regard to the number of entries present in the lower, or upper, triangle of the input matrix. In contrast to classical recursive cache-oblivious solutions, the iterative nature of our blocking-scheme does not inhibit other essential optimizations such as software prefetching. In order to assess the viability of our blocking-scheme as a cache-oblivious strategy, we applied it to the in-place transposition of square matrices. Extensive experiments show that our cache-oblivious transposition algorithm generally outperforms the cache-aware state-of-the-art algorithm in terms of throughput and energy efficiency in sequential as well as parallel environments.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种新的用于方阵缓存无关就地转置的三角形空间填充曲线
本文提出了一种新的基于三角形空间填充曲线的缓存无关阻塞方案。所提出的阻塞方案减少了三角矩阵遍历在主机内存层次结构中的数据移动,三角矩阵遍历固有地表现出较差的数据局部性,例如方阵的就地转置。我们表明,我们的缓存无关阻塞方案可以在线性时间和恒定内存中迭代地生成,与输入矩阵的下三角形或上三角形中存在的条目数量有关。与经典的递归缓参无关解决方案相比,我们的阻塞方案的迭代特性不会抑制其他必要的优化,例如软件预取。为了评估我们的阻塞方案作为缓存无关策略的可行性,我们将其应用于方阵的就地转置。大量的实验表明,我们的缓存无关转置算法在串行和并行环境中的吞吐量和能源效率方面通常优于缓存感知的最先进算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
GPU-Accelerated Error-Bounded Compression Framework for Quantum Circuit Simulations Generalizable Reinforcement Learning-Based Coarsening Model for Resource Allocation over Large and Diverse Stream Processing Graphs Smart Redbelly Blockchain: Reducing Congestion for Web3 QoS-Aware and Cost-Efficient Dynamic Resource Allocation for Serverless ML Workflows Fast Sparse GPU Kernels for Accelerated Training of Graph Neural Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1