OpenCL Kernel Vectorization on the CPU, GPU, and FPGA: A Case Study with Frequent Pattern Compression

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2019-04-01 DOI:10.1109/FCCM.2019.00071

Zheming Jin, H. Finkel

{"title":"OpenCL Kernel Vectorization on the CPU, GPU, and FPGA: A Case Study with Frequent Pattern Compression","authors":"Zheming Jin, H. Finkel","doi":"10.1109/FCCM.2019.00071","DOIUrl":null,"url":null,"abstract":"OpenCL promotes code portability, and natively supports vectorized data types, which allows developers to potentially take advantage of the single-instruction-multiple-data instructions on CPUs, GPUs, and FPGAs. FPGAs are becoming a promising heterogeneous computing component. In our study, we choose a kernel used in frequent pattern compression as a case study of OpenCL kernel vectorizations on the three computing platforms. We describe different pattern matching approaches for the kernel, and manually vectorize the OpenCL kernel by a factor ranging from 2 to 16. We evaluate the kernel on an Intel Xeon 16-core CPU, an NVIDIA P100 GPU, and a Nallatech 385A FPGA card featuring an Intel Arria 10 GX1150 FPGA. Compared to the optimized kernel that is not vectorized, our vectorization can improve the kernel performance by a factor of 16 on the FPGA. The performance improvement ranges from 1 to 11.4 on the CPU, and from 1.02 to 9.3 on the GPU. The effectiveness of kernel vectorization depends on the work-group size.","PeriodicalId":116955,"journal":{"name":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2019.00071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

OpenCL promotes code portability, and natively supports vectorized data types, which allows developers to potentially take advantage of the single-instruction-multiple-data instructions on CPUs, GPUs, and FPGAs. FPGAs are becoming a promising heterogeneous computing component. In our study, we choose a kernel used in frequent pattern compression as a case study of OpenCL kernel vectorizations on the three computing platforms. We describe different pattern matching approaches for the kernel, and manually vectorize the OpenCL kernel by a factor ranging from 2 to 16. We evaluate the kernel on an Intel Xeon 16-core CPU, an NVIDIA P100 GPU, and a Nallatech 385A FPGA card featuring an Intel Arria 10 GX1150 FPGA. Compared to the optimized kernel that is not vectorized, our vectorization can improve the kernel performance by a factor of 16 on the FPGA. The performance improvement ranges from 1 to 11.4 on the CPU, and from 1.02 to 9.3 on the GPU. The effectiveness of kernel vectorization depends on the work-group size.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在CPU, GPU和FPGA上的OpenCL内核矢量化:频繁模式压缩的案例研究

OpenCL促进了代码的可移植性，并且原生地支持向量化数据类型，这使得开发人员能够潜在地利用cpu、gpu和fpga上的单指令多数据指令。fpga正在成为一种很有前途的异构计算组件。在我们的研究中，我们选择了一个用于频繁模式压缩的内核作为三个计算平台上OpenCL内核矢量化的案例研究。我们描述了内核的不同模式匹配方法，并通过2到16的因子对OpenCL内核进行了手动矢量化。我们在英特尔至强16核CPU, NVIDIA P100 GPU和具有英特尔Arria 10 GX1150 FPGA的Nallatech 385A FPGA卡上评估内核。与未向量化的优化内核相比，我们的向量化可以将FPGA上的内核性能提高16倍。CPU性能提升幅度为1 ~ 11.4,GPU性能提升幅度为1.02 ~ 9.3。核矢量化的有效性取决于工作组的大小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

自引率

0.00%

发文量