Performance Portability of Sparse Block Diagonal Matrix Multiple Vector Multiplications on GPUs

K. Ibrahim, Chao Yang, Pieter Maris
{"title":"Performance Portability of Sparse Block Diagonal Matrix Multiple Vector Multiplications on GPUs","authors":"K. Ibrahim, Chao Yang, Pieter Maris","doi":"10.1109/P3HPC56579.2022.00011","DOIUrl":null,"url":null,"abstract":"The emergence of accelerator-based computer architectures and programming models makes it challenging to achieve performance portability for large-scale scientific simulation software. In this paper, we focus on a sparse block diagonal matrix multiple vector (SpMM) computational kernel and discuss techniques that can be used to achieve performance portability on NVIDIA and AMD based accelerators using CUDA, HIP, OpenACC, Kokkos. We show that performance portability can vary significantly across programming models, GPU architectures, and problem settings, by up to 52× in the explored problems. Our study visits the performance portability aggregation techniques to guide the development and the selection of performance portable algorithmic variants.","PeriodicalId":261766,"journal":{"name":"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/P3HPC56579.2022.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The emergence of accelerator-based computer architectures and programming models makes it challenging to achieve performance portability for large-scale scientific simulation software. In this paper, we focus on a sparse block diagonal matrix multiple vector (SpMM) computational kernel and discuss techniques that can be used to achieve performance portability on NVIDIA and AMD based accelerators using CUDA, HIP, OpenACC, Kokkos. We show that performance portability can vary significantly across programming models, GPU architectures, and problem settings, by up to 52× in the explored problems. Our study visits the performance portability aggregation techniques to guide the development and the selection of performance portable algorithmic variants.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
稀疏块对角矩阵多向量乘法在gpu上的性能可移植性
基于加速器的计算机体系结构和编程模型的出现,给大规模科学仿真软件实现性能可移植性带来了挑战。在本文中,我们重点研究了稀疏块对角矩阵多向量(SpMM)计算内核,并讨论了可用于在基于NVIDIA和AMD的加速器上实现性能可移植性的技术,这些加速器使用CUDA, HIP, OpenACC, Kokkos。我们表明,性能可移植性可以在编程模型、GPU架构和问题设置之间显著不同,在所探索的问题中,差异可达52倍。我们的研究访问了性能可移植性聚合技术,以指导性能可移植性算法变体的开发和选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Understanding Strong Scaling on GPUs Using Empirical Performance Saturation Size Performance Portability of Sparse Block Diagonal Matrix Multiple Vector Multiplications on GPUs Performance portable Vlasov code with C++ parallel algorithm Leveraging Compiler-Based Translation to Evaluate a Diversity of Exascale Platforms Heterogeneous Programming for the Homogeneous Majority
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1