{"title":"用OpenCL评价便携式晶格玻尔兹曼代码的性能","authors":"Simon McIntosh-Smith, Dan Curran","doi":"10.1145/2664666.2664668","DOIUrl":null,"url":null,"abstract":"With the advent of many-core computer architectures such as GPGPUs from NVIDIA and AMD, and more recently Intel's Xeon Phi, ensuring performance portability of HPC codes is potentially becoming more complex. In this work we have focused on one important application area --- structured grid codes --- and investigated techniques exploiting OpenCL to enable performance portability across a diverse range of high-end many-core architectures. In particular we have chosen to investigate 3D lattice Boltzmann codes (D3Q19 BGK). We have developed an OpenCL version of this code in order to provide cross-platform functional portability, and compared the performance of this OpenCL version to optimized native versions on each target platform, including hybrid OpenMP/AVX versions on CPUs and Xeon Phi, and CUDA versions on NVIDIA GPUs. Results show that, contrary to conventional wisdom, using OpenCL it is possible to achieve a high degree of performance portability, at least for 3D lattice Boltzmann codes, using a set of straightforward techniques. The performance portable code in OpenCL is also highly competitive with the best performance using the native parallel programming models on each platform.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"28 1","pages":"2:1-2:12"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Evaluation of a performance portable lattice Boltzmann code using OpenCL\",\"authors\":\"Simon McIntosh-Smith, Dan Curran\",\"doi\":\"10.1145/2664666.2664668\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the advent of many-core computer architectures such as GPGPUs from NVIDIA and AMD, and more recently Intel's Xeon Phi, ensuring performance portability of HPC codes is potentially becoming more complex. In this work we have focused on one important application area --- structured grid codes --- and investigated techniques exploiting OpenCL to enable performance portability across a diverse range of high-end many-core architectures. In particular we have chosen to investigate 3D lattice Boltzmann codes (D3Q19 BGK). We have developed an OpenCL version of this code in order to provide cross-platform functional portability, and compared the performance of this OpenCL version to optimized native versions on each target platform, including hybrid OpenMP/AVX versions on CPUs and Xeon Phi, and CUDA versions on NVIDIA GPUs. Results show that, contrary to conventional wisdom, using OpenCL it is possible to achieve a high degree of performance portability, at least for 3D lattice Boltzmann codes, using a set of straightforward techniques. The performance portable code in OpenCL is also highly competitive with the best performance using the native parallel programming models on each platform.\",\"PeriodicalId\":73497,\"journal\":{\"name\":\"International Workshop on OpenCL\",\"volume\":\"28 1\",\"pages\":\"2:1-2:12\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Workshop on OpenCL\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2664666.2664668\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on OpenCL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2664666.2664668","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evaluation of a performance portable lattice Boltzmann code using OpenCL
With the advent of many-core computer architectures such as GPGPUs from NVIDIA and AMD, and more recently Intel's Xeon Phi, ensuring performance portability of HPC codes is potentially becoming more complex. In this work we have focused on one important application area --- structured grid codes --- and investigated techniques exploiting OpenCL to enable performance portability across a diverse range of high-end many-core architectures. In particular we have chosen to investigate 3D lattice Boltzmann codes (D3Q19 BGK). We have developed an OpenCL version of this code in order to provide cross-platform functional portability, and compared the performance of this OpenCL version to optimized native versions on each target platform, including hybrid OpenMP/AVX versions on CPUs and Xeon Phi, and CUDA versions on NVIDIA GPUs. Results show that, contrary to conventional wisdom, using OpenCL it is possible to achieve a high degree of performance portability, at least for 3D lattice Boltzmann codes, using a set of straightforward techniques. The performance portable code in OpenCL is also highly competitive with the best performance using the native parallel programming models on each platform.