A Study of the Performance of Multifluid PPM Gas Dynamics on CPUs and GPUs

Pei-Hung Lin, J. Jayaraj, P. Woodward
{"title":"A Study of the Performance of Multifluid PPM Gas Dynamics on CPUs and GPUs","authors":"Pei-Hung Lin, J. Jayaraj, P. Woodward","doi":"10.1109/SAAHPC.2011.27","DOIUrl":null,"url":null,"abstract":"The potential for GPUs and many-core CPUs to support high performance computation in the area of computational fluid dynamics (CFD) is explored quantitatively through the example of the PPM gas dynamics code with PPB multi fluid volume fraction advection. This code has already been implemented on the IBM Cell processor and run at full scale on the Los Alamos Roadrunner machine. This implementation has involved a complete restructuring of the code that has been described in detail elsewhere. Here the lessons learned from that work are exploited to take advantage oftoday's latest generations of multi-core CPUs and many-core GPUs. The operations performed by this code are characterized in detail after being first decomposed into a series of individual code kernels to allow an implementation on GPUs. Careful implementations of this code for both CPUs and GPU sare then contrasted from a performance point of view. In addition, a single kernel that has many of the characteristics of the full application on CPUs has been built into a full, standalone, scalable parallel application. This single-kernel application shows the GPU at its best. In contrast, the full multi fluid gas dynamics application brings into play computational requirements that highlight the essential differences in CPU and GPU designs today and the different programming strategies needed to achieve the best performance for applications of this type on the two devices. The single kernel application code performs extremely well on both platforms. This application is not limited by main memory bandwidth on either device instead it is limited only by the computational capability of each. In this case, the GPU has the advantage, because it has more computational cores. The full multi fluid gas dynamics code is, however, of necessity memory bandwidth limited on the GPU, while it is still computational capability limited on the CPU. We believe that these codes provide a useful context for quantifying the costs and benefits of design decisions for these powerful new computing devices. Suggestions for improvements in both devices and codes based upon this work are offered in our conclusions.","PeriodicalId":331604,"journal":{"name":"2011 Symposium on Application Accelerators in High-Performance Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2011-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Symposium on Application Accelerators in High-Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAAHPC.2011.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The potential for GPUs and many-core CPUs to support high performance computation in the area of computational fluid dynamics (CFD) is explored quantitatively through the example of the PPM gas dynamics code with PPB multi fluid volume fraction advection. This code has already been implemented on the IBM Cell processor and run at full scale on the Los Alamos Roadrunner machine. This implementation has involved a complete restructuring of the code that has been described in detail elsewhere. Here the lessons learned from that work are exploited to take advantage oftoday's latest generations of multi-core CPUs and many-core GPUs. The operations performed by this code are characterized in detail after being first decomposed into a series of individual code kernels to allow an implementation on GPUs. Careful implementations of this code for both CPUs and GPU sare then contrasted from a performance point of view. In addition, a single kernel that has many of the characteristics of the full application on CPUs has been built into a full, standalone, scalable parallel application. This single-kernel application shows the GPU at its best. In contrast, the full multi fluid gas dynamics application brings into play computational requirements that highlight the essential differences in CPU and GPU designs today and the different programming strategies needed to achieve the best performance for applications of this type on the two devices. The single kernel application code performs extremely well on both platforms. This application is not limited by main memory bandwidth on either device instead it is limited only by the computational capability of each. In this case, the GPU has the advantage, because it has more computational cores. The full multi fluid gas dynamics code is, however, of necessity memory bandwidth limited on the GPU, while it is still computational capability limited on the CPU. We believe that these codes provide a useful context for quantifying the costs and benefits of design decisions for these powerful new computing devices. Suggestions for improvements in both devices and codes based upon this work are offered in our conclusions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多流体PPM气体动力学在cpu和gpu上的性能研究
通过PPB多流体体积分数平流的PPM气体动力学代码实例,定量地探讨了gpu和多核cpu在计算流体动力学(CFD)领域支持高性能计算的潜力。该代码已经在IBM Cell处理器上实现,并在Los Alamos Roadrunner机器上全面运行。这个实现涉及到代码的完全重构,这在其他地方有详细的描述。在这里,我们将从这项工作中吸取经验教训,以充分利用当今最新一代的多核cpu和多核gpu。该代码执行的操作首先被分解为一系列单独的代码内核,以便在gpu上实现,然后对其进行详细描述。这段代码在cpu和GPU上的仔细实现,然后从性能的角度进行对比。此外,具有cpu上完整应用程序的许多特征的单个内核已被构建为完整的、独立的、可扩展的并行应用程序。这个单内核应用程序展示了GPU的最佳状态。相比之下,完整的多流体气体动力学应用程序带来了计算需求,突出了当今CPU和GPU设计的本质差异,以及在两种设备上实现这类应用程序的最佳性能所需的不同编程策略。单个内核应用程序代码在两个平台上都执行得非常好。此应用程序不受任何设备上主内存带宽的限制,而是仅受每个设备的计算能力的限制。在这种情况下,GPU具有优势,因为它具有更多的计算核心。然而,完整的多流体气体动力学代码在GPU上的内存带宽有限,而在CPU上的计算能力仍然有限。我们相信这些代码为量化这些强大的新型计算设备的设计决策的成本和收益提供了有用的背景。基于这项工作,我们在结论中提出了改进设备和代码的建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Experience Applying Fortran GPU Compilers to Numerical Weather Prediction Implications of Memory-Efficiency on Sparse Matrix-Vector Multiplication Application of Graphics Processing Units (GPUs) to the Study of Non-linear Dynamics of the Exciton Bose-Einstein Condensate in a Semiconductor Quantum Well A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures Evaluation of GPU Architectures Using Spiking Neural Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1