应用有限体积方法的通用GPU加速

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) Pub Date : 2016-05-16 DOI:10.1109/CCGrid.2016.30

Jingheng Xu, H. Fu, L. Gan, Chao Yang, Wei Xue, Shizhen Xu, Wenlai Zhao, Xinliang Wang, Bingwei Chen, Guangwen Yang

{"title":"应用有限体积方法的通用GPU加速","authors":"Jingheng Xu, H. Fu, L. Gan, Chao Yang, Wei Xue, Shizhen Xu, Wenlai Zhao, Xinliang Wang, Bingwei Chen, Guangwen Yang","doi":"10.1109/CCGrid.2016.30","DOIUrl":null,"url":null,"abstract":"Scientific HPC applications are increasingly ported to GPUs to benefit from both the high throughput and the powerful computing capacity. Many of these applications, such as atmospheric modeling and hydraulic erosion simulation, are adopting the finite volume method (FVM) as the solver algorithm. However, the communication components inside these applications generally lead to a low flop-to-byte ratio and an inefficient utilization of GPU resources. This paper aims at optimizing FVM solver based on the structured mesh. Besides a high-level overview of the finite-volume method as well as its basic optimizations on modern GPU platforms, we further present two generalized tuning techniques including an explicit cache mechanism as well as an inner-thread rescheduling method that tries to achieve a suitable mapping between the algorithm feature and the platform architecture. To the end, we demonstrate the impact of our generalized optimization methods in two typical atmospheric dynamic kernels (Euler and SWE) based on four mainstream GPU platforms. According to the experimental results of Tesla K80, speedups of 24.4x for SWE and 31.5x for Euler could be achieved over a 12-core Intel E5-2697 CPU, which is a great promotion compared with its original speedup (18x and 15.47x) without applying these two methods.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Generalized GPU Acceleration for Applications Employing Finite-Volume Methods\",\"authors\":\"Jingheng Xu, H. Fu, L. Gan, Chao Yang, Wei Xue, Shizhen Xu, Wenlai Zhao, Xinliang Wang, Bingwei Chen, Guangwen Yang\",\"doi\":\"10.1109/CCGrid.2016.30\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Scientific HPC applications are increasingly ported to GPUs to benefit from both the high throughput and the powerful computing capacity. Many of these applications, such as atmospheric modeling and hydraulic erosion simulation, are adopting the finite volume method (FVM) as the solver algorithm. However, the communication components inside these applications generally lead to a low flop-to-byte ratio and an inefficient utilization of GPU resources. This paper aims at optimizing FVM solver based on the structured mesh. Besides a high-level overview of the finite-volume method as well as its basic optimizations on modern GPU platforms, we further present two generalized tuning techniques including an explicit cache mechanism as well as an inner-thread rescheduling method that tries to achieve a suitable mapping between the algorithm feature and the platform architecture. To the end, we demonstrate the impact of our generalized optimization methods in two typical atmospheric dynamic kernels (Euler and SWE) based on four mainstream GPU platforms. According to the experimental results of Tesla K80, speedups of 24.4x for SWE and 31.5x for Euler could be achieved over a 12-core Intel E5-2697 CPU, which is a great promotion compared with its original speedup (18x and 15.47x) without applying these two methods.\",\"PeriodicalId\":103641,\"journal\":{\"name\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGrid.2016.30\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2016.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

越来越多的科学高性能计算应用被移植到gpu上，以获得高吞吐量和强大的计算能力。许多此类应用，如大气建模和水力侵蚀模拟，都采用有限体积法(FVM)作为求解算法。然而，这些应用程序内部的通信组件通常会导致较低的flop-to-byte比率和GPU资源的低效利用。本文旨在对基于结构化网格的FVM求解器进行优化。除了概述有限体积方法及其在现代GPU平台上的基本优化外，我们还进一步提出了两种通用的调优技术，包括显式缓存机制和内部线程重调度方法，该方法试图在算法特征和平台架构之间实现适当的映射。最后，我们在基于四种主流GPU平台的两种典型大气动态内核(Euler和SWE)上展示了我们的广义优化方法的影响。根据特斯拉K80的实验结果，在12核Intel E5-2697 CPU上，SWE和Euler的加速分别可以达到24.4倍和31.5倍，在不采用这两种方法的情况下，与原来的加速(18倍和15.47倍)相比有了很大的提升。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Generalized GPU Acceleration for Applications Employing Finite-Volume Methods

Scientific HPC applications are increasingly ported to GPUs to benefit from both the high throughput and the powerful computing capacity. Many of these applications, such as atmospheric modeling and hydraulic erosion simulation, are adopting the finite volume method (FVM) as the solver algorithm. However, the communication components inside these applications generally lead to a low flop-to-byte ratio and an inefficient utilization of GPU resources. This paper aims at optimizing FVM solver based on the structured mesh. Besides a high-level overview of the finite-volume method as well as its basic optimizations on modern GPU platforms, we further present two generalized tuning techniques including an explicit cache mechanism as well as an inner-thread rescheduling method that tries to achieve a suitable mapping between the algorithm feature and the platform architecture. To the end, we demonstrate the impact of our generalized optimization methods in two typical atmospheric dynamic kernels (Euler and SWE) based on four mainstream GPU platforms. According to the experimental results of Tesla K80, speedups of 24.4x for SWE and 31.5x for Euler could be achieved over a 12-core Intel E5-2697 CPU, which is a great promotion compared with its original speedup (18x and 15.47x) without applying these two methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

自引率

0.00%

发文量