Pushing the limits of accelerator efficiency while retaining programmability

Tony Nowatzki, Vinay Gangadhar, K. Sankaralingam, G. Wright
{"title":"Pushing the limits of accelerator efficiency while retaining programmability","authors":"Tony Nowatzki, Vinay Gangadhar, K. Sankaralingam, G. Wright","doi":"10.1109/HPCA.2016.7446051","DOIUrl":null,"url":null,"abstract":"The waning benefits of device scaling have caused a push towards domain specific accelerators (DSAs), which sacrifice programmability for efficiency. While providing huge benefits, DSAs are prone to obsoletion due to domain volatility, have recurring design and verification costs, and have large area footprints when multiple DSAs are required in a single device. Because of the benefits of generality, this work explores how far a programmable architecture can be pushed, and whether it can come close to the performance, energy, and area efficiency of a DSA-based approach. Our insight is that DSAs employ common specialization principles for concurrency, computation, communication, data-reuse and coordination, and that these same principles can be exploited in a programmable architecture using a composition of known microarchitectural mechanisms. Specifically, we propose and study an architecture called LSSD, which is composed of many low-power and tiny cores, each having a configurable spatial architecture, scratchpads, and DMA. Our results show that a programmable, specialized architecture can indeed be competitive with a domain-specific approach. Compared to four prominent and diverse DSAs, LSSD can match the DSAs' 10× to 150× speedup over an OOO core, with only up to 4× more area and power than a single DSA, while retaining programmability.","PeriodicalId":417994,"journal":{"name":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2016.7446051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 43

Abstract

The waning benefits of device scaling have caused a push towards domain specific accelerators (DSAs), which sacrifice programmability for efficiency. While providing huge benefits, DSAs are prone to obsoletion due to domain volatility, have recurring design and verification costs, and have large area footprints when multiple DSAs are required in a single device. Because of the benefits of generality, this work explores how far a programmable architecture can be pushed, and whether it can come close to the performance, energy, and area efficiency of a DSA-based approach. Our insight is that DSAs employ common specialization principles for concurrency, computation, communication, data-reuse and coordination, and that these same principles can be exploited in a programmable architecture using a composition of known microarchitectural mechanisms. Specifically, we propose and study an architecture called LSSD, which is composed of many low-power and tiny cores, each having a configurable spatial architecture, scratchpads, and DMA. Our results show that a programmable, specialized architecture can indeed be competitive with a domain-specific approach. Compared to four prominent and diverse DSAs, LSSD can match the DSAs' 10× to 150× speedup over an OOO core, with only up to 4× more area and power than a single DSA, while retaining programmability.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在保持可编程性的同时推动加速器效率的极限
设备可扩展性的优势逐渐减弱,导致了对特定领域加速器(dsa)的推动,这种加速器牺牲了可编程性以提高效率。虽然dsa提供了巨大的好处,但由于域的波动性,dsa容易过时,具有重复的设计和验证成本,并且在单个设备中需要多个dsa时占地面积很大。由于通用性的好处,这项工作探讨了可编程架构可以推进到什么程度,以及它是否可以接近基于dsa的方法的性能、能量和面积效率。我们的见解是,dsa在并发性、计算、通信、数据重用和协调方面采用通用的专门化原则,并且这些相同的原则可以在使用已知微体系结构机制组合的可编程体系结构中得到利用。具体来说,我们提出并研究了一种称为LSSD的架构,它由许多低功耗和微小的内核组成,每个内核都具有可配置的空间架构、刮擦板和DMA。我们的结果表明,可编程的、专门的体系结构确实可以与特定于领域的方法竞争。与四个突出的多样化DSA相比,LSSD可以在OOO内核上匹配DSA的10倍到150倍的加速,而面积和功率仅比单个DSA多4倍,同时保留可编程性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A low power software-defined-radio baseband processor for the Internet of Things Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing MaPU: A novel mathematical computing architecture A low-power hybrid reconfigurable architecture for resistive random-access memories PleaseTM: Enabling transaction conflict management in requester-wins hardware transactional memory
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1