Intel gpu的OpenCL工作组缩减分析

2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) Pub Date : 2016-09-01 DOI:10.1109/SYNASC.2016.070

Grigore Lupescu, E. Slusanschi, N. Tapus

{"title":"Intel gpu的OpenCL工作组缩减分析","authors":"Grigore Lupescu, E. Slusanschi, N. Tapus","doi":"10.1109/SYNASC.2016.070","DOIUrl":null,"url":null,"abstract":"As hardware becomes more flexible in terms ofprogramming, software APIs must expose hardware features ina portable way. Additions in the OpenCL 2.0 API expose threadcommunication through the newly defined work-group functions. In this paper we focus on two implementations of the work-groupfunctions in the OpenCL compiler backend for Intel's GPUs. Wefirst describe the particularities of Intel's GEN GPU architectureand the Beignet OpenCL open source project. Both work-groupimplementations are then detailed, one based on thread to threadmessage passing while the other on thread to shared local memoryread/write. The focus is around choosing an optimal variant basedon how each implementation maps to the hardware and its impacton performance.","PeriodicalId":268635,"journal":{"name":"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Analysis of OpenCL Work-Group Reduce for Intel GPUs\",\"authors\":\"Grigore Lupescu, E. Slusanschi, N. Tapus\",\"doi\":\"10.1109/SYNASC.2016.070\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As hardware becomes more flexible in terms ofprogramming, software APIs must expose hardware features ina portable way. Additions in the OpenCL 2.0 API expose threadcommunication through the newly defined work-group functions. In this paper we focus on two implementations of the work-groupfunctions in the OpenCL compiler backend for Intel's GPUs. Wefirst describe the particularities of Intel's GEN GPU architectureand the Beignet OpenCL open source project. Both work-groupimplementations are then detailed, one based on thread to threadmessage passing while the other on thread to shared local memoryread/write. The focus is around choosing an optimal variant basedon how each implementation maps to the hardware and its impacton performance.\",\"PeriodicalId\":268635,\"journal\":{\"name\":\"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SYNASC.2016.070\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNASC.2016.070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

随着硬件在编程方面变得更加灵活，软件api必须以可移植的方式暴露硬件特性。OpenCL 2.0 API中的新增功能通过新定义的工作组函数公开线程通信。在本文中，我们重点研究了在英特尔gpu的OpenCL编译器后端中工作组函数的两种实现。我们首先描述了英特尔的GEN GPU架构和Beignet OpenCL开源项目的特殊性。然后详细介绍两个工作组的实现，一个基于线程到线程的消息传递，而另一个基于线程到共享本地内存的读/写。重点是根据每个实现如何映射到硬件及其对性能的影响来选择最佳变体。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Analysis of OpenCL Work-Group Reduce for Intel GPUs

As hardware becomes more flexible in terms ofprogramming, software APIs must expose hardware features ina portable way. Additions in the OpenCL 2.0 API expose threadcommunication through the newly defined work-group functions. In this paper we focus on two implementations of the work-groupfunctions in the OpenCL compiler backend for Intel's GPUs. Wefirst describe the particularities of Intel's GEN GPU architectureand the Beignet OpenCL open source project. Both work-groupimplementations are then detailed, one based on thread to threadmessage passing while the other on thread to shared local memoryread/write. The focus is around choosing an optimal variant basedon how each implementation maps to the hardware and its impacton performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)

自引率

0.00%

发文量