Convolution engine: balancing efficiency & flexibility in specialized computing

Proceedings of the 40th Annual International Symposium on Computer Architecture Pub Date : 2013-06-23 DOI:10.1145/2485922.2485925

W. Qadeer, R. Hameed, Ofer Shacham, P. Venkatesan, C. Kozyrakis, M. Horowitz

{"title":"Convolution engine: balancing efficiency & flexibility in specialized computing","authors":"W. Qadeer, R. Hameed, Ofer Shacham, P. Venkatesan, C. Kozyrakis, M. Horowitz","doi":"10.1145/2485922.2485925","DOIUrl":null,"url":null,"abstract":"This paper focuses on the trade-off between flexibility and efficiency in specialized computing. We observe that specialized units achieve most of their efficiency gains by tuning data storage and compute structures and their connectivity to the data-flow and data-locality patterns in the kernels. Hence, by identifying key data-flow patterns used in a domain, we can create efficient engines that can be programmed and reused across a wide range of applications. We present an example, the Convolution Engine (CE), specialized for the convolution-like data-flow that is common in computational photography, image processing, and video processing applications. CE achieves energy efficiency by capturing data reuse patterns, eliminating data transfer overheads, and enabling a large number of operations per memory access. We quantify the tradeoffs in efficiency and flexibility and demonstrate that CE is within a factor of 2-3x of the energy and area efficiency of custom units optimized for a single kernel. CE improves energy and area efficiency by 8-15x over a SIMD engine for most applications.","PeriodicalId":20555,"journal":{"name":"Proceedings of the 40th Annual International Symposium on Computer Architecture","volume":"56 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"187","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2485922.2485925","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 187

Abstract

This paper focuses on the trade-off between flexibility and efficiency in specialized computing. We observe that specialized units achieve most of their efficiency gains by tuning data storage and compute structures and their connectivity to the data-flow and data-locality patterns in the kernels. Hence, by identifying key data-flow patterns used in a domain, we can create efficient engines that can be programmed and reused across a wide range of applications. We present an example, the Convolution Engine (CE), specialized for the convolution-like data-flow that is common in computational photography, image processing, and video processing applications. CE achieves energy efficiency by capturing data reuse patterns, eliminating data transfer overheads, and enabling a large number of operations per memory access. We quantify the tradeoffs in efficiency and flexibility and demonstrate that CE is within a factor of 2-3x of the energy and area efficiency of custom units optimized for a single kernel. CE improves energy and area efficiency by 8-15x over a SIMD engine for most applications.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

卷积引擎:在专业计算中平衡效率和灵活性

本文关注的是在专用计算中灵活性和效率之间的权衡。我们观察到，专门的单元通过调优数据存储和计算结构以及它们与内核中的数据流和数据局部性模式的连接来实现大部分效率增益。因此，通过识别域中使用的关键数据流模式，我们可以创建高效的引擎，这些引擎可以在广泛的应用程序中编程和重用。我们给出了一个例子，卷积引擎(CE)，专门用于在计算摄影、图像处理和视频处理应用中常见的类似卷积的数据流。CE通过捕获数据重用模式、消除数据传输开销和支持每次内存访问的大量操作来实现能源效率。我们量化了效率和灵活性的权衡，并证明CE在为单个内核优化的定制单元的能量和面积效率的2-3倍之内。对于大多数应用，CE比SIMD引擎提高了8-15倍的能量和面积效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 40th Annual International Symposium on Computer Architecture

自引率

0.00%

发文量