首页 > 最新文献

Proceedings of the 2016 International Symposium on Low Power Electronics and Design最新文献

英文 中文
A Robust and Energy-Efficient Classifier Using Brain-Inspired Hyperdimensional Computing 基于脑启发超维计算的鲁棒节能分类器
Abbas Rahimi, P. Kanerva, J. Rabaey
The mathematical properties of high-dimensional (HD) spaces show remarkable agreement with behaviors controlled by the brain. Computing with HD vectors, referred to as "hypervectors," is a brain-inspired alternative to computing with numbers. Hypervectors are high-dimensional, holographic, and (pseudo)random with independent and identically distributed (i.i.d.) components. They provide for energy-efficient computing while tolerating hardware variation typical of nanoscale fabrics. We describe a hardware architecture for a hypervector-based classifier and demonstrate it with language identification from letter trigrams. The HD classifier is 96.7% accurate, 1.2% lower than a conventional machine learning method, operating with half the energy. Moreover, the HD classifier is able to tolerate 8.8-fold probability of failure of memory cells while maintaining 94% accuracy. This robust behavior with erroneous memory cells can significantly improve energy efficiency.
高维空间的数学特性与大脑控制的行为表现出显著的一致性。使用高清矢量计算,被称为“超矢量”,是一种受大脑启发的数字计算替代方案。超向量是具有独立和同分布(i.i.d)分量的高维、全息和(伪)随机。它们提供节能计算,同时容忍纳米级织物典型的硬件变化。我们描述了一个基于超向量的分类器的硬件架构,并演示了它与字母三元组的语言识别。HD分类器的准确率为96.7%,比传统的机器学习方法低1.2%,运行能量只有传统机器学习方法的一半。此外,HD分类器能够容忍8.8倍的记忆单元故障概率,同时保持94%的准确率。这种具有错误记忆细胞的稳健行为可以显著提高能量效率。
{"title":"A Robust and Energy-Efficient Classifier Using Brain-Inspired Hyperdimensional Computing","authors":"Abbas Rahimi, P. Kanerva, J. Rabaey","doi":"10.1145/2934583.2934624","DOIUrl":"https://doi.org/10.1145/2934583.2934624","url":null,"abstract":"The mathematical properties of high-dimensional (HD) spaces show remarkable agreement with behaviors controlled by the brain. Computing with HD vectors, referred to as \"hypervectors,\" is a brain-inspired alternative to computing with numbers. Hypervectors are high-dimensional, holographic, and (pseudo)random with independent and identically distributed (i.i.d.) components. They provide for energy-efficient computing while tolerating hardware variation typical of nanoscale fabrics. We describe a hardware architecture for a hypervector-based classifier and demonstrate it with language identification from letter trigrams. The HD classifier is 96.7% accurate, 1.2% lower than a conventional machine learning method, operating with half the energy. Moreover, the HD classifier is able to tolerate 8.8-fold probability of failure of memory cells while maintaining 94% accuracy. This robust behavior with erroneous memory cells can significantly improve energy efficiency.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123888253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 185
Session details: Low Power Design Methodologies 会议细节:低功耗设计方法
A. Fahim
{"title":"Session details: Low Power Design Methodologies","authors":"A. Fahim","doi":"10.1145/3256020","DOIUrl":"https://doi.org/10.1145/3256020","url":null,"abstract":"","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"132 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130876386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Effective and Efficient Quality Management for Approximate Computing 关于近似计算的有效和高效质量管理
Ting Wang, Qian Zhang, N. Kim, Q. Xu
Approximate computing, where computation quality is traded off for better performance and/or energy savings, has gained significant tractions from both academia and industry. With approximate computing, we expect to obtain acceptable results, but how do we make sure the quality of the final results are acceptable? This challenging problem remains largely unexplored. In this paper, we propose an effective and efficient quality management framework to achieve controlled quality-efficiency tradeoffs. To be specific, at the offline stage, our solution automatically selects an appropriate approximator configuration considering rollback recovery for large occasional errors with minimum cost under the target quality requirement. Then during the online execution, our framework judiciously determines when and how to rollback, which is achieved with cost-effective yet accurate quality predictors that synergistically combine the outputs of several basic light-weight predictors. Experimental results demonstrate that our proposed solution can achieve 11% to 23% energy savings compared to existing solutions under the target quality requirement.
近似计算,即以计算质量为代价换取更好的性能和/或节能,已经从学术界和工业界获得了极大的关注。通过近似计算,我们期望获得可接受的结果,但是我们如何确保最终结果的质量是可接受的呢?这个具有挑战性的问题在很大程度上仍未被探索。在本文中,我们提出了一个有效和高效的质量管理框架,以实现受控的质量-效率权衡。具体来说,在离线阶段,我们的解决方案会自动选择合适的近似器配置,考虑在目标质量要求下以最小的成本对大型偶发错误进行回滚恢复。然后,在在线执行期间,我们的框架明智地决定何时以及如何回滚,这是通过经济有效且准确的质量预测器实现的,这些预测器协同结合了几个基本轻量级预测器的输出。实验结果表明,在目标质量要求下,我们提出的方案比现有方案节能11% ~ 23%。
{"title":"On Effective and Efficient Quality Management for Approximate Computing","authors":"Ting Wang, Qian Zhang, N. Kim, Q. Xu","doi":"10.1145/2934583.2934608","DOIUrl":"https://doi.org/10.1145/2934583.2934608","url":null,"abstract":"Approximate computing, where computation quality is traded off for better performance and/or energy savings, has gained significant tractions from both academia and industry. With approximate computing, we expect to obtain acceptable results, but how do we make sure the quality of the final results are acceptable? This challenging problem remains largely unexplored. In this paper, we propose an effective and efficient quality management framework to achieve controlled quality-efficiency tradeoffs. To be specific, at the offline stage, our solution automatically selects an appropriate approximator configuration considering rollback recovery for large occasional errors with minimum cost under the target quality requirement. Then during the online execution, our framework judiciously determines when and how to rollback, which is achieved with cost-effective yet accurate quality predictors that synergistically combine the outputs of several basic light-weight predictors. Experimental results demonstrate that our proposed solution can achieve 11% to 23% energy savings compared to existing solutions under the target quality requirement.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130361964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
SATS: An Ultra-Low Power Time Synchronization for Solar Energy Harvesting WSNs SATS:太阳能收集WSNs的超低功率时间同步
Tongda Wu, Yongpan Liu, Hehe Li, C. Xue, H. Lee, Huazhong Yang
Reliable and ultra-low power time synchronization becomes more and more important with the popularity of energy harvesting sensor nodes. This paper proposes an untethered and probabilistic ultra-lower power time synchronization method for energy intermittent sensor network. It avoids the frequent RF communications with the assistance of a solar clock. The SATS system consists of two main parts: the synchronizer, a low power solar clock module for time synchronization, and the S3-Mapping, an offline sequence matching algorithm. Furthermore, we develop an improved version of S3-Mapping, which reduces the computation complexity from exponential to linear using the redundancy models and the onion peeling method. The SATS system is validated by both simulations and a prototype, which shows that the second level synchronization precision can be achieved under reasonable probability. What's more, the energy consumption of time synchronization is reduced by over 1 ~ 2 magnitudes compared with the up-to-date low power time synchronization protocol.
随着能量采集传感器节点的普及,可靠和超低功耗的时间同步变得越来越重要。针对能量间歇传感器网络,提出了一种无约束的概率超低功耗时间同步方法。它避免了在太阳能时钟的帮助下频繁的射频通信。SATS系统由两个主要部分组成:同步器,用于时间同步的低功耗太阳能时钟模块,以及离线序列匹配算法S3-Mapping。在此基础上,我们开发了一种改进的S3-Mapping算法,利用冗余模型和洋葱剥离方法将计算复杂度从指数型降低到线性型。仿真和样机验证了SATS系统在合理的概率下可以达到二级同步精度。与现有的低功耗时间同步协议相比,该协议的时间同步能耗降低了1 ~ 2个量级。
{"title":"SATS: An Ultra-Low Power Time Synchronization for Solar Energy Harvesting WSNs","authors":"Tongda Wu, Yongpan Liu, Hehe Li, C. Xue, H. Lee, Huazhong Yang","doi":"10.1145/2934583.2934601","DOIUrl":"https://doi.org/10.1145/2934583.2934601","url":null,"abstract":"Reliable and ultra-low power time synchronization becomes more and more important with the popularity of energy harvesting sensor nodes. This paper proposes an untethered and probabilistic ultra-lower power time synchronization method for energy intermittent sensor network. It avoids the frequent RF communications with the assistance of a solar clock. The SATS system consists of two main parts: the synchronizer, a low power solar clock module for time synchronization, and the S3-Mapping, an offline sequence matching algorithm. Furthermore, we develop an improved version of S3-Mapping, which reduces the computation complexity from exponential to linear using the redundancy models and the onion peeling method. The SATS system is validated by both simulations and a prototype, which shows that the second level synchronization precision can be achieved under reasonable probability. What's more, the energy consumption of time synchronization is reduced by over 1 ~ 2 magnitudes compared with the up-to-date low power time synchronization protocol.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115168314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
OS-based Resource Accounting for Asynchronous Resource Use in Mobile Systems 基于操作系统的移动系统异步资源使用资源计费
Farshad Ghanei, Pranav Tipnis, Kyle Marcus, Karthik Dantu, Steven Y. Ko, Lukasz Ziarek
One essential functionality of a modern operating system is to accurately account for the resource usage of the underlying hardware. This is especially important for computing systems that operate on battery power, since energy management requires accurately attributing resource uses to processes. However, components such as sensors, actuators and specialized network interfaces are often used in an asynchronous fashion, and makes it difficult to conduct accurate resource accounting. For example, a process that makes a request to a sensor may not be running on the processor for the full duration of the resource usage; and current mechanisms of resource accounting fail to provide accurate accounting for such asynchronous uses. This paper proposes a new mechanism to accurately account for the asynchronous usage of resources in mobile systems. Our insight is that by accurately relating the user requests with kernel requests to device and corresponding device responses, we can accurately attribute resource use to the requesting process. Our prototype implemented in Linux demonstrates that we can account for the usage of asynchronous resources such as GPS and WiFi accurately.
现代操作系统的一个基本功能是准确地说明底层硬件的资源使用情况。这对于使用电池供电的计算系统尤其重要,因为能源管理需要准确地将资源使用归属于进程。然而,诸如传感器、执行器和专用网络接口之类的组件通常以异步方式使用,因此很难进行准确的资源核算。例如,向传感器发出请求的进程可能在资源使用的整个持续时间内不会在处理器上运行;目前的资源核算机制无法为这种异步使用提供准确的核算。本文提出了一种新的机制来准确地解释移动系统中资源的异步使用。我们的见解是,通过准确地将用户请求与对设备的内核请求和相应的设备响应联系起来,我们可以准确地将资源使用归因于请求进程。我们在Linux中实现的原型表明,我们可以准确地说明GPS和WiFi等异步资源的使用情况。
{"title":"OS-based Resource Accounting for Asynchronous Resource Use in Mobile Systems","authors":"Farshad Ghanei, Pranav Tipnis, Kyle Marcus, Karthik Dantu, Steven Y. Ko, Lukasz Ziarek","doi":"10.1145/2934583.2934639","DOIUrl":"https://doi.org/10.1145/2934583.2934639","url":null,"abstract":"One essential functionality of a modern operating system is to accurately account for the resource usage of the underlying hardware. This is especially important for computing systems that operate on battery power, since energy management requires accurately attributing resource uses to processes. However, components such as sensors, actuators and specialized network interfaces are often used in an asynchronous fashion, and makes it difficult to conduct accurate resource accounting. For example, a process that makes a request to a sensor may not be running on the processor for the full duration of the resource usage; and current mechanisms of resource accounting fail to provide accurate accounting for such asynchronous uses. This paper proposes a new mechanism to accurately account for the asynchronous usage of resources in mobile systems. Our insight is that by accurately relating the user requests with kernel requests to device and corresponding device responses, we can accurately attribute resource use to the requesting process. Our prototype implemented in Linux demonstrates that we can account for the usage of asynchronous resources such as GPS and WiFi accurately.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"332 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114233958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Analysis and Design of Energy Efficient Time Domain Signal Processing 节能时域信号处理的分析与设计
Zhengyu Chen, Jie Gu
Time domain signal processing (TDSP) encodes information into time rather than voltage with higher efficiency than conventional digital design. This paper performs systematical analysis on the design principle and energy efficiency of TDSP. Variation impact, which poses significant challenges to TDSP, is evaluated and a variation driven design methodology is proposed to achieve an optimum tradeoff between energy efficiency and design robustness. Several novel circuit level design techniques such as dual encoding strategy and bit-scalable design are also proposed in this work to significantly improve the energy efficiency of TDSP. Design example on a critical building block of facial recognition application was used to demonstrate the potential of the technique. The result in a 45nm technology shows 3.3X energy-delay product reduction and 34% area saving can be achieved using TDSP compared with conventional digital design technique.
时域信号处理(TDSP)将信息编码为时间而不是电压,比传统的数字设计具有更高的效率。本文对TDSP的设计原理和能效进行了系统的分析。本文评估了对TDSP构成重大挑战的变化影响,并提出了一种变化驱动设计方法,以实现能源效率和设计鲁棒性之间的最佳权衡。本文还提出了一些新的电路级设计技术,如双编码策略和位可扩展设计,以显着提高TDSP的能量效率。以人脸识别应用程序的一个关键构建块为例,展示了该技术的潜力。结果表明,与传统的数字设计技术相比,在45nm技术下使用TDSP可以实现3.3倍的能量延迟产品减少和34%的面积节省。
{"title":"Analysis and Design of Energy Efficient Time Domain Signal Processing","authors":"Zhengyu Chen, Jie Gu","doi":"10.1145/2934583.2934585","DOIUrl":"https://doi.org/10.1145/2934583.2934585","url":null,"abstract":"Time domain signal processing (TDSP) encodes information into time rather than voltage with higher efficiency than conventional digital design. This paper performs systematical analysis on the design principle and energy efficiency of TDSP. Variation impact, which poses significant challenges to TDSP, is evaluated and a variation driven design methodology is proposed to achieve an optimum tradeoff between energy efficiency and design robustness. Several novel circuit level design techniques such as dual encoding strategy and bit-scalable design are also proposed in this work to significantly improve the energy efficiency of TDSP. Design example on a critical building block of facial recognition application was used to demonstrate the potential of the technique. The result in a 45nm technology shows 3.3X energy-delay product reduction and 34% area saving can be achieved using TDSP compared with conventional digital design technique.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123730222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Measurement-Driven Methodology for Evaluating Processor Heterogeneity Options for Power-Performance Efficiency 测量驱动的方法评估处理器异构选项的功率-性能效率
William J. Song, A. Buyuktosunoglu, Chen-Yong Cher, P. Bose
It is generally perceived that heterogeneous multicore processors will provide better performance and power efficiency over conventional homogeneous cores. However, heterogeneity can also be achieved within a homogeneous core design, instantiated under different voltage-frequency settings or per-core simultaneous multi-treading (SMT) modes. In this paper, we pursue an architectural study motivated by the question, "Can we get by with a single, complex SMT-equipped core design that can operate at different voltage-frequency points? Or, is it mandatory to invest into two different core types, one complex and the other simple?" We propose a systematic, measurement-driven methodology to evaluate processor heterogeneity options. Our analysis particularly focuses on the domain of real-time constrained embedded processors. The study is based on a direct measurement of two real processors; one that uses simple in-order cores, and another that uses complex out-of-order cores. The effect of heterogeneous core composition (consisting of complex and simple cores in the same chip) is analytically projected from measurements gleaned from the two different systems. Our analysis yields new interesting insights. When dealing with two core types without SMT enabled, true core heterogeneity does not necessarily provide better performance or power efficiency under area and power constraints. If the complex-core homogeneous processor invokes SMT, it outperforms true heterogeneity by offering 28% better power efficiency, assuming that simple cores in the heterogeneous system operate only in single-threaded mode without SMT capability. If the small cores employ SMT, true heterogeneity yields 32% better power efficiency than the homogeneous processor with SMT.
人们普遍认为,异构多核处理器将比传统的同构核提供更好的性能和能效。然而,异质性也可以在均匀的核心设计中实现,在不同的电压频率设置或每个核心同步多线(SMT)模式下实例化。在本文中,我们进行了一项架构研究,其动机是这样一个问题:“我们是否可以使用一个单一的、复杂的smt核心设计,它可以在不同的电压频率点上工作?”还是必须投资于两种不同的核心类型,一种是复杂的,另一种是简单的?”我们提出了一个系统的,测量驱动的方法来评估处理器异构选项。我们的分析主要集中在实时约束嵌入式处理器领域。这项研究是基于对两个真实处理器的直接测量;一个使用简单的有序核,另一个使用复杂的无序核。异质内核组成(由同一芯片中的复杂和简单内核组成)的影响通过从两个不同系统收集的测量结果进行分析预测。我们的分析产生了新的有趣的见解。在处理没有启用SMT的两种核心类型时,在面积和功率限制下,真正的核心异构不一定提供更好的性能或功率效率。如果复杂核同构处理器调用SMT,那么它的性能优于真正的异构,因为它提供了28%的高功率效率(假设异构系统中的简单内核仅以单线程模式运行,没有SMT功能)。如果小内核采用SMT,真正的异构性比采用SMT的同质处理器的功率效率高32%。
{"title":"Measurement-Driven Methodology for Evaluating Processor Heterogeneity Options for Power-Performance Efficiency","authors":"William J. Song, A. Buyuktosunoglu, Chen-Yong Cher, P. Bose","doi":"10.1145/2934583.2934637","DOIUrl":"https://doi.org/10.1145/2934583.2934637","url":null,"abstract":"It is generally perceived that heterogeneous multicore processors will provide better performance and power efficiency over conventional homogeneous cores. However, heterogeneity can also be achieved within a homogeneous core design, instantiated under different voltage-frequency settings or per-core simultaneous multi-treading (SMT) modes. In this paper, we pursue an architectural study motivated by the question, \"Can we get by with a single, complex SMT-equipped core design that can operate at different voltage-frequency points? Or, is it mandatory to invest into two different core types, one complex and the other simple?\" We propose a systematic, measurement-driven methodology to evaluate processor heterogeneity options. Our analysis particularly focuses on the domain of real-time constrained embedded processors. The study is based on a direct measurement of two real processors; one that uses simple in-order cores, and another that uses complex out-of-order cores. The effect of heterogeneous core composition (consisting of complex and simple cores in the same chip) is analytically projected from measurements gleaned from the two different systems. Our analysis yields new interesting insights. When dealing with two core types without SMT enabled, true core heterogeneity does not necessarily provide better performance or power efficiency under area and power constraints. If the complex-core homogeneous processor invokes SMT, it outperforms true heterogeneity by offering 28% better power efficiency, assuming that simple cores in the heterogeneous system operate only in single-threaded mode without SMT capability. If the small cores employ SMT, true heterogeneity yields 32% better power efficiency than the homogeneous processor with SMT.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114966076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster 基于深度流水线FPGA集群的节能CNN实现
Chen Zhang, Di Wu, Jiayu Sun, Guangyu Sun, Guojie Luo, J. Cong
Recently, FPGA-based CNN accelerators have demonstrated superior energy efficiency compared to high-performance devices like GPGPUs. However, due to the constrained on-chip resource and many other factors, single-board FPGA designs may have difficulties in achieving optimal energy efficiency. In this paper we present a deeply pipelined multi-FPGA architecture that expands the design space for optimal performance and energy efficiency. A dynamic programming algorithm is proposed to map the CNN computing layers efficiently to different FPGA boards. To demonstrate the potential of the architecture, we built a prototype system with seven FPGA boards connected with high-speed serial links. The experimental results on AlexNet and VGG-16 show that the prototype can achieve up to 21x and 2x energy efficiency compared to optimized multi-core CPU and GPU implementations, respectively.
最近,与gpgpu等高性能设备相比,基于fpga的CNN加速器表现出了卓越的能效。然而,由于片上资源的限制和许多其他因素,单板FPGA设计可能难以达到最佳的能量效率。在本文中,我们提出了一个深度流水线的多fpga架构,扩展了最佳性能和能源效率的设计空间。提出了一种动态规划算法,将CNN计算层有效地映射到不同的FPGA板上。为了展示该架构的潜力,我们建立了一个由7块FPGA板与高速串行链路连接的原型系统。AlexNet和VGG-16上的实验结果表明,与优化后的多核CPU和GPU实现相比,该原型可以实现高达21倍和2倍的能效。
{"title":"Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster","authors":"Chen Zhang, Di Wu, Jiayu Sun, Guangyu Sun, Guojie Luo, J. Cong","doi":"10.1145/2934583.2934644","DOIUrl":"https://doi.org/10.1145/2934583.2934644","url":null,"abstract":"Recently, FPGA-based CNN accelerators have demonstrated superior energy efficiency compared to high-performance devices like GPGPUs. However, due to the constrained on-chip resource and many other factors, single-board FPGA designs may have difficulties in achieving optimal energy efficiency. In this paper we present a deeply pipelined multi-FPGA architecture that expands the design space for optimal performance and energy efficiency. A dynamic programming algorithm is proposed to map the CNN computing layers efficiently to different FPGA boards. To demonstrate the potential of the architecture, we built a prototype system with seven FPGA boards connected with high-speed serial links. The experimental results on AlexNet and VGG-16 show that the prototype can achieve up to 21x and 2x energy efficiency compared to optimized multi-core CPU and GPU implementations, respectively.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134069576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 183
A Thermal-Aware Physical Space Allocation Strategy for 3D Flash Memory Storage Systems 三维闪存存储系统的热感知物理空间分配策略
Yi Wang, Mingxu Zhang, Lisha Dong, Xuan Yang
Three-dimensional (3D) flash memory stacks layers of data storage cells vertically to overcome the scaling limits in conventional planar NAND flash memory. Current 3D flash memory faces new challenges including thermal issues and complex manufacturing process. This paper presents TheraPhy, a novel thermal-aware physical space allocation strategy for three-dimensional flash memory storage systems. TheraPhy permutes the allocation of physical blocks. Consecutively accessed logical blocks are distributed to different physical locations in order to prevent the accumulation of hotspots. TheraPhy requires no changes to the file system, on-chip memory hierarchy, or hardware implementation of 3D flash memory. Based on TheraPhy, we present an address mapping strategy that is capable of determining the allocation of physical blocks based on their thermal status. We demonstrate the viability of the proposed technique using a set of extensive experiments. Experimental results show that TheraPhy can reduce the peak temperature by 15.39% with less than 1% extra erase overhead in comparison with the baseline scheme.
三维(3D)闪存垂直堆叠数据存储单元层,以克服传统平面NAND闪存的缩放限制。当前的3D闪存面临着新的挑战,包括热问题和复杂的制造工艺。本文提出了一种用于三维闪存存储系统的热感知物理空间分配策略TheraPhy。治疗安排了物理块的分配。连续访问的逻辑块被分配到不同的物理位置,以防止热点的积累。TheraPhy不需要改变文件系统、片上存储器层次结构或3D闪存的硬件实现。基于TheraPhy,我们提出了一种地址映射策略,该策略能够根据物理块的热状态确定物理块的分配。我们通过一系列广泛的实验证明了所提出技术的可行性。实验结果表明,与基线方案相比,TheraPhy方案可以在不超过1%的额外擦除开销的情况下将峰值温度降低15.39%。
{"title":"A Thermal-Aware Physical Space Allocation Strategy for 3D Flash Memory Storage Systems","authors":"Yi Wang, Mingxu Zhang, Lisha Dong, Xuan Yang","doi":"10.1145/2934583.2934638","DOIUrl":"https://doi.org/10.1145/2934583.2934638","url":null,"abstract":"Three-dimensional (3D) flash memory stacks layers of data storage cells vertically to overcome the scaling limits in conventional planar NAND flash memory. Current 3D flash memory faces new challenges including thermal issues and complex manufacturing process. This paper presents TheraPhy, a novel thermal-aware physical space allocation strategy for three-dimensional flash memory storage systems. TheraPhy permutes the allocation of physical blocks. Consecutively accessed logical blocks are distributed to different physical locations in order to prevent the accumulation of hotspots. TheraPhy requires no changes to the file system, on-chip memory hierarchy, or hardware implementation of 3D flash memory. Based on TheraPhy, we present an address mapping strategy that is capable of determining the allocation of physical blocks based on their thermal status. We demonstrate the viability of the proposed technique using a set of extensive experiments. Experimental results show that TheraPhy can reduce the peak temperature by 15.39% with less than 1% extra erase overhead in comparison with the baseline scheme.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133576331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
How to Cope with Slow Transistors in the Top-tier of Monolithic 3D ICs: Design Studies and CAD Solutions 如何应对单片3D集成电路顶层的慢速晶体管:设计研究和CAD解决方案
S. Samal, D. Nayak, M. Ichihashi, S. Banna, S. Lim
In this paper we study the impact of low thermal budget process on design quality in monolithic 3D ICs (M3D). Specifically, we quantify how much the tier-to-tier transistor performance difference affects full-chip power and performance metrics in a foundry 14nm FinFET technology. Our study first shows that 5%, 10%, and 15% top-tier device degradation in a wire-dominated, timing-closed monolithic 3D IC design leads to 7%, 12%, and 18% full-chip timing violation, respectively. Next, we address this impact with our CAD solution named Tier-Aware M3D (TA-M3D) flow that identifies potential timing-critical paths and partitions them into the faster (bottom) tier to minimize the top-tier degradation impact. One unique challenge in timing closure in this case, is how to conduct buffering and sizing on the paths that lie entirely in the top or bottom-tier as well as those that span both tiers. Our approach handles all 3 types of paths carefully and closes timing under the given top-tier degradation assumption, while minimizing the total power consumption. Our enhanced monolithic 3D IC designs, even with 5%, 10%, and 15% slower transistors in the top-tier, still offers 26%, 24%, and 5% power savings over 2D IC, respectively. Our study also covers other types of circuits.
本文研究了低热预算工艺对单片三维集成电路(M3D)设计质量的影响。具体来说,我们量化了在14纳米FinFET技术中,层对层晶体管性能差异对全芯片功率和性能指标的影响程度。我们的研究首先表明,在以线为主导、时序封闭的单片3D IC设计中,5%、10%和15%的顶级器件退化分别导致7%、12%和18%的全芯片时序违规。接下来,我们使用名为分层感知M3D (TA-M3D)流的CAD解决方案来解决这种影响,该解决方案识别潜在的时间关键路径,并将它们划分到更快的(底层)层,以最大限度地减少顶层的退化影响。在这种情况下,计时闭包的一个独特挑战是如何对完全位于顶层或底层以及跨越两层的路径进行缓冲和调整大小。我们的方法仔细处理所有三种类型的路径,并在给定的顶层退化假设下关闭时序,同时最小化总功耗。我们的增强型单片3D IC设计,即使在顶级晶体管中速度慢5%,10%和15%,仍然比2D IC分别节省26%,24%和5%的功耗。我们的研究也涵盖了其他类型的电路。
{"title":"How to Cope with Slow Transistors in the Top-tier of Monolithic 3D ICs: Design Studies and CAD Solutions","authors":"S. Samal, D. Nayak, M. Ichihashi, S. Banna, S. Lim","doi":"10.1145/2934583.2934643","DOIUrl":"https://doi.org/10.1145/2934583.2934643","url":null,"abstract":"In this paper we study the impact of low thermal budget process on design quality in monolithic 3D ICs (M3D). Specifically, we quantify how much the tier-to-tier transistor performance difference affects full-chip power and performance metrics in a foundry 14nm FinFET technology. Our study first shows that 5%, 10%, and 15% top-tier device degradation in a wire-dominated, timing-closed monolithic 3D IC design leads to 7%, 12%, and 18% full-chip timing violation, respectively. Next, we address this impact with our CAD solution named Tier-Aware M3D (TA-M3D) flow that identifies potential timing-critical paths and partitions them into the faster (bottom) tier to minimize the top-tier degradation impact. One unique challenge in timing closure in this case, is how to conduct buffering and sizing on the paths that lie entirely in the top or bottom-tier as well as those that span both tiers. Our approach handles all 3 types of paths carefully and closes timing under the given top-tier degradation assumption, while minimizing the total power consumption. Our enhanced monolithic 3D IC designs, even with 5%, 10%, and 15% slower transistors in the top-tier, still offers 26%, 24%, and 5% power savings over 2D IC, respectively. Our study also covers other types of circuits.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132627873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
Proceedings of the 2016 International Symposium on Low Power Electronics and Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1