A Framework for Adding Low-Overhead, Fine-Grained Power Domains to CGRAs

Ankita Nayak, Keyi Zhang, Rajsekhar Setaluri, Alex Carsello, Makai Mann, S. Richardson, Rick Bahr, P. Hanrahan, M. Horowitz, Priyanka Raina
{"title":"A Framework for Adding Low-Overhead, Fine-Grained Power Domains to CGRAs","authors":"Ankita Nayak, Keyi Zhang, Rajsekhar Setaluri, Alex Carsello, Makai Mann, S. Richardson, Rick Bahr, P. Hanrahan, M. Horowitz, Priyanka Raina","doi":"10.23919/DATE48585.2020.9116477","DOIUrl":null,"url":null,"abstract":"To effectively minimize static power for a wide range of applications, power domains for a coarse-grained reconfigurable array (CGRA) need to be finer-grained than a typical ASIC. However, the special isolation logic needed to ensure electrical protection between off and on domains makes fine-grained power domains area- and timing-inefficient. We propose a novel design of the CGRA routing fabric that intrinsically provides boundary protection. This technique reduces the area overhead of boundary protection between power domains for the CGRA from around 9% to less than 1% and removes the delay from the isolation cells. However, with this design choice, we cannot leverage the conventional UPF-based flow to introduce power domain boundary protection. We create compiler-like passes that iteratively introduce the needed design transformations, and formally verify the passes with satisfiability modulo theories (SMT) methods. These passes also allow us to optimize how we handle test and debug signals through the off tiles. We use our framework to insert power domains into an SoC with an ARM Cortex M3 processor and a CGRA with 32 × 16 processing element (PE) and memory tiles and 4MB secondary memory. Depending on the size of the applications mapped, our CGRA achieves up to an 83% reduction in leakage power and 26% reduction in total power versus a CGRA without multiple power domains, for a range of image processing and machine learning applications.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/DATE48585.2020.9116477","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

To effectively minimize static power for a wide range of applications, power domains for a coarse-grained reconfigurable array (CGRA) need to be finer-grained than a typical ASIC. However, the special isolation logic needed to ensure electrical protection between off and on domains makes fine-grained power domains area- and timing-inefficient. We propose a novel design of the CGRA routing fabric that intrinsically provides boundary protection. This technique reduces the area overhead of boundary protection between power domains for the CGRA from around 9% to less than 1% and removes the delay from the isolation cells. However, with this design choice, we cannot leverage the conventional UPF-based flow to introduce power domain boundary protection. We create compiler-like passes that iteratively introduce the needed design transformations, and formally verify the passes with satisfiability modulo theories (SMT) methods. These passes also allow us to optimize how we handle test and debug signals through the off tiles. We use our framework to insert power domains into an SoC with an ARM Cortex M3 processor and a CGRA with 32 × 16 processing element (PE) and memory tiles and 4MB secondary memory. Depending on the size of the applications mapped, our CGRA achieves up to an 83% reduction in leakage power and 26% reduction in total power versus a CGRA without multiple power domains, for a range of image processing and machine learning applications.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
向CGRAs添加低开销、细粒度功率域的框架
为了有效地减少各种应用的静态功率,粗粒度可重构阵列(CGRA)的功率域需要比典型的ASIC更细粒度。然而,确保关开域之间的电气保护所需的特殊隔离逻辑使得细粒度功率域的面积和时间效率低下。我们提出了一种新颖的CGRA路由结构设计,它本质上提供了边界保护。该技术将CGRA功率域之间的边界保护面积开销从约9%降低到小于1%,并消除了隔离单元的延迟。然而,在这种设计选择下,我们无法利用传统的基于upf的流程来引入功率域边界保护。我们创建类似编译器的传递,迭代地引入所需的设计转换,并使用可满足模理论(SMT)方法正式验证传递。这些传递还允许我们优化如何通过关闭块处理测试和调试信号。我们使用我们的框架将电源域插入到具有ARM Cortex M3处理器和具有32 × 16处理元件(PE)、内存块和4MB辅助存储器的CGRA的SoC中。根据所映射应用的大小,与没有多个功率域的CGRA相比,我们的CGRA在一系列图像处理和机器学习应用中实现了高达83%的泄漏功率降低和26%的总功率降低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
In-Memory Resistive RAM Implementation of Binarized Neural Networks for Medical Applications Towards Formal Verification of Optimized and Industrial Multipliers A 100KHz-1GHz Termination-dependent Human Body Communication Channel Measurement using Miniaturized Wearable Devices Computational SRAM Design Automation using Pushed-Rule Bitcells for Energy-Efficient Vector Processing PIM-Aligner: A Processing-in-MRAM Platform for Biological Sequence Alignment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1