高效和可扩展的水平扩散天气模板计算的空间加速

Gagandeep Singh, Alireza Khodamoradi, K. Denolf, Jack Lo, Juan G'omez-Luna, Joseph Melber, Andra Bisca, H. Corporaal, O. Mutlu
{"title":"高效和可扩展的水平扩散天气模板计算的空间加速","authors":"Gagandeep Singh, Alireza Khodamoradi, K. Denolf, Jack Lo, Juan G'omez-Luna, Joseph Melber, Andra Bisca, H. Corporaal, O. Mutlu","doi":"10.1145/3577193.3593719","DOIUrl":null,"url":null,"abstract":"Fast and accurate climate simulations and weather predictions are critical for understanding and preparing for the impact of climate change. Real-world climate and weather simulations involve the use of complex compound stencil kernels, which are composed of a combination of different stencils. Horizontal diffusion is one such important compound stencil found in many climate and weather prediction models. Its computation involves a large amount of data access and manipulation that leads to two main issues on current computing systems. First, such compound stencils have high memory bandwidth demands as they require large amounts of data access. Second, compound stencils have complex data access patterns and poor data locality, as the memory access pattern is typically irregular with low arithmetic intensity. As a result, state-of-the-art CPU and GPU implementations suffer from limited performance and high energy consumption. Recent works propose using FPGAs as an alternative to traditional CPU and GPU-based systems to accelerate weather stencil kernels. However, we observe that stencil computation cannot leverage the bit-level flexibility available on an FPGA because of its complex memory access patterns, leading to high hardware resource utilization and low peak performance. We introduce SPARTA, a novel spatial accelerator for horizontal diffusion weather stencil computation. We exploit the two-dimensional spatial architecture to efficiently accelerate the horizontal diffusion stencil by designing the first scaled-out spatial accelerator using the MLIR (Multi-Level Intermediate Representation) compiler framework. We evaluate SPARTA on a real cutting-edge AMD-Xilinx Versal AI Engine (AIE) spatial architecture. Our real-system evaluation results demonstrate that SPARTA outperforms state-of-the-art CPU, GPU, and FPGA implementations by 17.1×, 1.2×, and 2.1×, respectively. Compared to the most energy-efficient design on an HBM-based FPGA, SPARTA provides 2.43× higher energy efficiency. Our results reveal that balancing workload across the available processing resources is crucial in achieving high performance on spatial architectures. We also implement and evaluate five elementary stencils that are commonly used as benchmarks for stencil computation research. We freely open-source all our implementations to aid future research in stencil computation and spatial computing systems at https://github.com/CMU-SAFARI/SPARTA.","PeriodicalId":424155,"journal":{"name":"Proceedings of the 37th International Conference on Supercomputing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"SPARTA: Spatial Acceleration for Efficient and Scalable Horizontal Diffusion Weather Stencil Computation\",\"authors\":\"Gagandeep Singh, Alireza Khodamoradi, K. Denolf, Jack Lo, Juan G'omez-Luna, Joseph Melber, Andra Bisca, H. Corporaal, O. Mutlu\",\"doi\":\"10.1145/3577193.3593719\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fast and accurate climate simulations and weather predictions are critical for understanding and preparing for the impact of climate change. Real-world climate and weather simulations involve the use of complex compound stencil kernels, which are composed of a combination of different stencils. Horizontal diffusion is one such important compound stencil found in many climate and weather prediction models. Its computation involves a large amount of data access and manipulation that leads to two main issues on current computing systems. First, such compound stencils have high memory bandwidth demands as they require large amounts of data access. Second, compound stencils have complex data access patterns and poor data locality, as the memory access pattern is typically irregular with low arithmetic intensity. As a result, state-of-the-art CPU and GPU implementations suffer from limited performance and high energy consumption. Recent works propose using FPGAs as an alternative to traditional CPU and GPU-based systems to accelerate weather stencil kernels. However, we observe that stencil computation cannot leverage the bit-level flexibility available on an FPGA because of its complex memory access patterns, leading to high hardware resource utilization and low peak performance. We introduce SPARTA, a novel spatial accelerator for horizontal diffusion weather stencil computation. We exploit the two-dimensional spatial architecture to efficiently accelerate the horizontal diffusion stencil by designing the first scaled-out spatial accelerator using the MLIR (Multi-Level Intermediate Representation) compiler framework. We evaluate SPARTA on a real cutting-edge AMD-Xilinx Versal AI Engine (AIE) spatial architecture. Our real-system evaluation results demonstrate that SPARTA outperforms state-of-the-art CPU, GPU, and FPGA implementations by 17.1×, 1.2×, and 2.1×, respectively. Compared to the most energy-efficient design on an HBM-based FPGA, SPARTA provides 2.43× higher energy efficiency. Our results reveal that balancing workload across the available processing resources is crucial in achieving high performance on spatial architectures. We also implement and evaluate five elementary stencils that are commonly used as benchmarks for stencil computation research. We freely open-source all our implementations to aid future research in stencil computation and spatial computing systems at https://github.com/CMU-SAFARI/SPARTA.\",\"PeriodicalId\":424155,\"journal\":{\"name\":\"Proceedings of the 37th International Conference on Supercomputing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 37th International Conference on Supercomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3577193.3593719\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3577193.3593719","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

快速准确的气候模拟和天气预报对于了解和应对气候变化的影响至关重要。现实世界的气候和天气模拟涉及使用复杂的复合模板内核,它由不同模板的组合组成。水平扩散是在许多气候和天气预报模型中发现的一种重要的复合模板。它的计算涉及大量的数据访问和操作,这导致了当前计算系统的两个主要问题。首先,这种复合模板需要大量的数据访问,因此对内存带宽的要求很高。其次,复合模板的数据访问模式复杂,数据局部性差,内存访问模式不规则,算术强度低。因此,最先进的CPU和GPU实现受到性能限制和高能耗的影响。最近的工作建议使用fpga来替代传统的基于CPU和gpu的系统来加速天气模板内核。然而,我们观察到,由于其复杂的内存访问模式,模板计算不能利用FPGA上可用的位级灵活性,导致高硬件资源利用率和低峰值性能。介绍了一种用于水平扩散天气模板计算的新型空间加速器SPARTA。我们利用二维空间架构,设计了第一个横向扩展空间加速器,并使用多层中间表示(Multi-Level Intermediate Representation, MLIR)编译器框架来有效地加速水平扩散模板。我们在真正尖端的AMD-Xilinx通用人工智能引擎(AIE)空间架构上评估SPARTA。我们的实际系统评估结果表明,SPARTA比最先进的CPU、GPU和FPGA实现分别高出17.1倍、1.2倍和2.1倍。与基于hbm的FPGA上最节能的设计相比,SPARTA的能效提高了2.43倍。我们的研究结果表明,在可用的处理资源之间平衡工作负载对于在空间架构上实现高性能至关重要。我们还实现和评估了五个基本模板,这些模板通常用作模板计算研究的基准。我们免费开放我们所有的实现,以帮助未来的研究在模板计算和空间计算系统在https://github.com/CMU-SAFARI/SPARTA。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SPARTA: Spatial Acceleration for Efficient and Scalable Horizontal Diffusion Weather Stencil Computation
Fast and accurate climate simulations and weather predictions are critical for understanding and preparing for the impact of climate change. Real-world climate and weather simulations involve the use of complex compound stencil kernels, which are composed of a combination of different stencils. Horizontal diffusion is one such important compound stencil found in many climate and weather prediction models. Its computation involves a large amount of data access and manipulation that leads to two main issues on current computing systems. First, such compound stencils have high memory bandwidth demands as they require large amounts of data access. Second, compound stencils have complex data access patterns and poor data locality, as the memory access pattern is typically irregular with low arithmetic intensity. As a result, state-of-the-art CPU and GPU implementations suffer from limited performance and high energy consumption. Recent works propose using FPGAs as an alternative to traditional CPU and GPU-based systems to accelerate weather stencil kernels. However, we observe that stencil computation cannot leverage the bit-level flexibility available on an FPGA because of its complex memory access patterns, leading to high hardware resource utilization and low peak performance. We introduce SPARTA, a novel spatial accelerator for horizontal diffusion weather stencil computation. We exploit the two-dimensional spatial architecture to efficiently accelerate the horizontal diffusion stencil by designing the first scaled-out spatial accelerator using the MLIR (Multi-Level Intermediate Representation) compiler framework. We evaluate SPARTA on a real cutting-edge AMD-Xilinx Versal AI Engine (AIE) spatial architecture. Our real-system evaluation results demonstrate that SPARTA outperforms state-of-the-art CPU, GPU, and FPGA implementations by 17.1×, 1.2×, and 2.1×, respectively. Compared to the most energy-efficient design on an HBM-based FPGA, SPARTA provides 2.43× higher energy efficiency. Our results reveal that balancing workload across the available processing resources is crucial in achieving high performance on spatial architectures. We also implement and evaluate five elementary stencils that are commonly used as benchmarks for stencil computation research. We freely open-source all our implementations to aid future research in stencil computation and spatial computing systems at https://github.com/CMU-SAFARI/SPARTA.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
FLORIA: A Fast and Featherlight Approach for Predicting Cache Performance FT-topo: Architecture-Driven Folded-Triangle Partitioning for Communication-efficient Graph Processing Using Additive Modifications in LU Factorization Instead of Pivoting GRAP: Group-level Resource Allocation Policy for Reconfigurable Dragonfly Network in HPC Enabling Reconfigurable HPC through MPI-based Inter-FPGA Communication
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1