A Study of the Performance of Multifluid PPM Gas Dynamics on CPUs and GPUs

2011 Symposium on Application Accelerators in High-Performance Computing Pub Date : 2011-07-19 DOI:10.1109/SAAHPC.2011.27

Pei-Hung Lin, J. Jayaraj, P. Woodward

{"title":"A Study of the Performance of Multifluid PPM Gas Dynamics on CPUs and GPUs","authors":"Pei-Hung Lin, J. Jayaraj, P. Woodward","doi":"10.1109/SAAHPC.2011.27","DOIUrl":null,"url":null,"abstract":"The potential for GPUs and many-core CPUs to support high performance computation in the area of computational fluid dynamics (CFD) is explored quantitatively through the example of the PPM gas dynamics code with PPB multi fluid volume fraction advection. This code has already been implemented on the IBM Cell processor and run at full scale on the Los Alamos Roadrunner machine. This implementation has involved a complete restructuring of the code that has been described in detail elsewhere. Here the lessons learned from that work are exploited to take advantage oftoday's latest generations of multi-core CPUs and many-core GPUs. The operations performed by this code are characterized in detail after being first decomposed into a series of individual code kernels to allow an implementation on GPUs. Careful implementations of this code for both CPUs and GPU sare then contrasted from a performance point of view. In addition, a single kernel that has many of the characteristics of the full application on CPUs has been built into a full, standalone, scalable parallel application. This single-kernel application shows the GPU at its best. In contrast, the full multi fluid gas dynamics application brings into play computational requirements that highlight the essential differences in CPU and GPU designs today and the different programming strategies needed to achieve the best performance for applications of this type on the two devices. The single kernel application code performs extremely well on both platforms. This application is not limited by main memory bandwidth on either device instead it is limited only by the computational capability of each. In this case, the GPU has the advantage, because it has more computational cores. The full multi fluid gas dynamics code is, however, of necessity memory bandwidth limited on the GPU, while it is still computational capability limited on the CPU. We believe that these codes provide a useful context for quantifying the costs and benefits of design decisions for these powerful new computing devices. Suggestions for improvements in both devices and codes based upon this work are offered in our conclusions.","PeriodicalId":331604,"journal":{"name":"2011 Symposium on Application Accelerators in High-Performance Computing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Symposium on Application Accelerators in High-Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAAHPC.2011.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The potential for GPUs and many-core CPUs to support high performance computation in the area of computational fluid dynamics (CFD) is explored quantitatively through the example of the PPM gas dynamics code with PPB multi fluid volume fraction advection. This code has already been implemented on the IBM Cell processor and run at full scale on the Los Alamos Roadrunner machine. This implementation has involved a complete restructuring of the code that has been described in detail elsewhere. Here the lessons learned from that work are exploited to take advantage oftoday's latest generations of multi-core CPUs and many-core GPUs. The operations performed by this code are characterized in detail after being first decomposed into a series of individual code kernels to allow an implementation on GPUs. Careful implementations of this code for both CPUs and GPU sare then contrasted from a performance point of view. In addition, a single kernel that has many of the characteristics of the full application on CPUs has been built into a full, standalone, scalable parallel application. This single-kernel application shows the GPU at its best. In contrast, the full multi fluid gas dynamics application brings into play computational requirements that highlight the essential differences in CPU and GPU designs today and the different programming strategies needed to achieve the best performance for applications of this type on the two devices. The single kernel application code performs extremely well on both platforms. This application is not limited by main memory bandwidth on either device instead it is limited only by the computational capability of each. In this case, the GPU has the advantage, because it has more computational cores. The full multi fluid gas dynamics code is, however, of necessity memory bandwidth limited on the GPU, while it is still computational capability limited on the CPU. We believe that these codes provide a useful context for quantifying the costs and benefits of design decisions for these powerful new computing devices. Suggestions for improvements in both devices and codes based upon this work are offered in our conclusions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多流体PPM气体动力学在cpu和gpu上的性能研究

通过PPB多流体体积分数平流的PPM气体动力学代码实例，定量地探讨了gpu和多核cpu在计算流体动力学(CFD)领域支持高性能计算的潜力。该代码已经在IBM Cell处理器上实现，并在Los Alamos Roadrunner机器上全面运行。这个实现涉及到代码的完全重构，这在其他地方有详细的描述。在这里，我们将从这项工作中吸取经验教训，以充分利用当今最新一代的多核cpu和多核gpu。该代码执行的操作首先被分解为一系列单独的代码内核，以便在gpu上实现，然后对其进行详细描述。这段代码在cpu和GPU上的仔细实现，然后从性能的角度进行对比。此外，具有cpu上完整应用程序的许多特征的单个内核已被构建为完整的、独立的、可扩展的并行应用程序。这个单内核应用程序展示了GPU的最佳状态。相比之下，完整的多流体气体动力学应用程序带来了计算需求，突出了当今CPU和GPU设计的本质差异，以及在两种设备上实现这类应用程序的最佳性能所需的不同编程策略。单个内核应用程序代码在两个平台上都执行得非常好。此应用程序不受任何设备上主内存带宽的限制，而是仅受每个设备的计算能力的限制。在这种情况下，GPU具有优势，因为它具有更多的计算核心。然而，完整的多流体气体动力学代码在GPU上的内存带宽有限，而在CPU上的计算能力仍然有限。我们相信这些代码为量化这些强大的新型计算设备的设计决策的成本和收益提供了有用的背景。基于这项工作，我们在结论中提出了改进设备和代码的建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2011 Symposium on Application Accelerators in High-Performance Computing

自引率

0.00%

发文量