从OpenCL推断自定义架构

2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS) Pub Date : 2015-12-07 DOI:10.1109/PATMOS.2015.7347581

Krzysztof Kepa, Ritesh Soni, P. Athanas

{"title":"从OpenCL推断自定义架构","authors":"Krzysztof Kepa, Ritesh Soni, P. Athanas","doi":"10.1109/PATMOS.2015.7347581","DOIUrl":null,"url":null,"abstract":"OpenCL has emerged as the de facto cross-platform standard in the GPU-based HPC computing domain. However, in FPGA-based HPC systems, OpenCL-to-FPGA compilers often yield suboptimal results due to the rigid architecture, limited shared-memory, and non-existent inter-work-item communication pathways implied by the OpenCL model. In this work, a methodology of inferring application-specific OpenCL “work-item” interfaces based on kernel code analysis is explored. A proof-of-concept prototype is implemented using an OpenCL source-to-source translator, which allows automated generation of the FPGA-based hardware accelerators directly from the OpenCL sources. The type and implementation of the inferred interface is tailored to match the data access patterns within the kernel. The inferred interface outperforms limitations of the OpenCL rigid architecture and communication model. The presented approach achieves a ~30x speedup over the generic memory-based approach for a 16 work-items application. A set of OpenCL coding patterns targeting FPGA-based HPC systems is also introduced. This technique is demonstrated on a popular bioinformatics algorithm, yet is applicable to any such algorithm with non-standard inter-cell communications.","PeriodicalId":325869,"journal":{"name":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Inferring custom architectures from OpenCL\",\"authors\":\"Krzysztof Kepa, Ritesh Soni, P. Athanas\",\"doi\":\"10.1109/PATMOS.2015.7347581\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"OpenCL has emerged as the de facto cross-platform standard in the GPU-based HPC computing domain. However, in FPGA-based HPC systems, OpenCL-to-FPGA compilers often yield suboptimal results due to the rigid architecture, limited shared-memory, and non-existent inter-work-item communication pathways implied by the OpenCL model. In this work, a methodology of inferring application-specific OpenCL “work-item” interfaces based on kernel code analysis is explored. A proof-of-concept prototype is implemented using an OpenCL source-to-source translator, which allows automated generation of the FPGA-based hardware accelerators directly from the OpenCL sources. The type and implementation of the inferred interface is tailored to match the data access patterns within the kernel. The inferred interface outperforms limitations of the OpenCL rigid architecture and communication model. The presented approach achieves a ~30x speedup over the generic memory-based approach for a 16 work-items application. A set of OpenCL coding patterns targeting FPGA-based HPC systems is also introduced. This technique is demonstrated on a popular bioinformatics algorithm, yet is applicable to any such algorithm with non-standard inter-cell communications.\",\"PeriodicalId\":325869,\"journal\":{\"name\":\"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PATMOS.2015.7347581\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PATMOS.2015.7347581","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

OpenCL已经成为基于gpu的高性能计算领域事实上的跨平台标准。然而，在基于fpga的HPC系统中，由于严格的体系结构、有限的共享内存以及OpenCL模型所隐含的不存在的工作项间通信路径，OpenCL到fpga的编译器经常产生次优结果。在这项工作中，探索了一种基于内核代码分析推断特定于应用程序的OpenCL“工作项”接口的方法。概念验证原型是使用OpenCL源到源转换器实现的，它允许直接从OpenCL源自动生成基于fpga的硬件加速器。对推断接口的类型和实现进行了调整，以匹配内核中的数据访问模式。推断的接口超越了OpenCL严格架构和通信模型的限制。对于一个16个工作项的应用程序，所提出的方法比基于内存的通用方法实现了约30倍的加速。介绍了一套针对基于fpga的高性能计算系统的OpenCL编码模式。该技术在一种流行的生物信息学算法上得到了演示，但适用于任何非标准细胞间通信的此类算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Inferring custom architectures from OpenCL

OpenCL has emerged as the de facto cross-platform standard in the GPU-based HPC computing domain. However, in FPGA-based HPC systems, OpenCL-to-FPGA compilers often yield suboptimal results due to the rigid architecture, limited shared-memory, and non-existent inter-work-item communication pathways implied by the OpenCL model. In this work, a methodology of inferring application-specific OpenCL “work-item” interfaces based on kernel code analysis is explored. A proof-of-concept prototype is implemented using an OpenCL source-to-source translator, which allows automated generation of the FPGA-based hardware accelerators directly from the OpenCL sources. The type and implementation of the inferred interface is tailored to match the data access patterns within the kernel. The inferred interface outperforms limitations of the OpenCL rigid architecture and communication model. The presented approach achieves a ~30x speedup over the generic memory-based approach for a 16 work-items application. A set of OpenCL coding patterns targeting FPGA-based HPC systems is also introduced. This technique is demonstrated on a popular bioinformatics algorithm, yet is applicable to any such algorithm with non-standard inter-cell communications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)

自引率

0.00%

发文量

期刊最新文献

Adaptive energy minimization of embedded heterogeneous systems using regression-based learning Energy management via PI control for data parallel applications with throughput constraints Energy-efficient Level Shifter topology Asynchronous sub-threshold ultra-low power processor Combining Pel Decimation with Partial Distortion Elimination to increase SAD energy efficiency