首页 > 最新文献

2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)最新文献

英文 中文
Approximate memory compression for energy-efficiency 近似内存压缩能源效率
Pub Date : 2017-07-01 DOI: 10.1109/ISLPED.2017.8009173
Ashish Ranjan, Arnab Raha, V. Raghunathan, A. Raghunathan
Memory subsystems are a major energy bottleneck in computing platforms due to frequent transfers between processors and off-chip memory. We propose approximate memory compression, a technique that leverages the intrinsic resilience of emerging workloads such as machine learning and data analytics to reduce off-chip memory traffic and energy. To realize approximate memory compression, we enhance the memory controller to be aware of memory regions that contain approximation-resilient data, and to transparently compress/decompress the data written to/read from these regions. To provide control over approximations, the quality-aware memory controller conforms to a specified error constraint for each approximate memory region. We design a software interface that programmers can use to identify data structures that are resilient to approximations. We also propose a runtime quality control framework that automatically determines the error constraints for the identified data structures such that a given target application-level quality is maintained. We evaluate our proposal by implementing a hardware prototype using the Intel UniPHY-DDR3 memory controller and NIOS-II processor, a Hynix DDR3 DRAM module, and a Stratix-IV FPGA development board. Across a suite of 8 machine learning benchmarks, approximate memory compression obtains a 1.28× benefit in DRAM energy and a simultaneous 11.5% improvement in execution time for a small (< 1.5%) loss in output quality.
由于处理器和片外存储器之间的频繁传输,内存子系统是计算平台的主要能源瓶颈。我们提出近似内存压缩,这是一种利用新兴工作负载(如机器学习和数据分析)的内在弹性来减少片外内存流量和能量的技术。为了实现近似内存压缩,我们增强了内存控制器,使其能够识别包含近似弹性数据的内存区域,并透明地压缩/解压缩从这些区域写入/读取的数据。为了提供对近似值的控制,质量感知存储器控制器符合每个近似值存储器区域的指定错误约束。我们设计了一个软件接口,程序员可以用它来识别对近似有弹性的数据结构。我们还提出了一个运行时质量控制框架,它可以自动确定已识别数据结构的错误约束,从而维持给定的目标应用程序级质量。我们通过使用英特尔UniPHY-DDR3内存控制器和NIOS-II处理器,Hynix DDR3 DRAM模块和Stratix-IV FPGA开发板实现硬件原型来评估我们的建议。在一组8个机器学习基准测试中,近似内存压缩在DRAM能量上获得了1.28倍的好处,同时在输出质量损失很小(< 1.5%)的情况下,执行时间提高了11.5%。
{"title":"Approximate memory compression for energy-efficiency","authors":"Ashish Ranjan, Arnab Raha, V. Raghunathan, A. Raghunathan","doi":"10.1109/ISLPED.2017.8009173","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009173","url":null,"abstract":"Memory subsystems are a major energy bottleneck in computing platforms due to frequent transfers between processors and off-chip memory. We propose approximate memory compression, a technique that leverages the intrinsic resilience of emerging workloads such as machine learning and data analytics to reduce off-chip memory traffic and energy. To realize approximate memory compression, we enhance the memory controller to be aware of memory regions that contain approximation-resilient data, and to transparently compress/decompress the data written to/read from these regions. To provide control over approximations, the quality-aware memory controller conforms to a specified error constraint for each approximate memory region. We design a software interface that programmers can use to identify data structures that are resilient to approximations. We also propose a runtime quality control framework that automatically determines the error constraints for the identified data structures such that a given target application-level quality is maintained. We evaluate our proposal by implementing a hardware prototype using the Intel UniPHY-DDR3 memory controller and NIOS-II processor, a Hynix DDR3 DRAM module, and a Stratix-IV FPGA development board. Across a suite of 8 machine learning benchmarks, approximate memory compression obtains a 1.28× benefit in DRAM energy and a simultaneous 11.5% improvement in execution time for a small (< 1.5%) loss in output quality.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122204334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Power optimizations in MTJ-based Neural Networks through Stochastic Computing 基于随机计算的mtj神经网络功率优化
Pub Date : 2017-07-01 DOI: 10.1109/ISLPED.2017.8009167
Ankit Mondal, Ankur Srivastava
Artificial Neural Networks (ANNs) have found widespread applications in tasks such as pattern recognition and image classification. However, hardware implementations of ANNs using conventional binary arithmetic units are computationally expensive, energy-intensive and have large area overheads. Stochastic Computing (SC) is an emerging paradigm which replaces these conventional units with simple logic circuits and is particularly suitable for fault-tolerant applications. Spintronic devices, such as Magnetic Tunnel Junctions (MTJs), are capable of replacing CMOS in memory and logic circuits. In this work, we propose an energy-efficient use of MTJs, which exhibit probabilistic switching behavior, as Stochastic Number Generators (SNGs), which forms the basis of our NN implementation in the SC domain. Further, error resilient target applications of NNs allow us to introduce Approximate Computing, a framework wherein accuracy of computations is traded-off for substantial reductions in power consumption. We propose approximating the synaptic weights in our MTJ-based NN implementation, in ways brought about by properties of our MTJ-SNG, to achieve energy-efficiency. We design an algorithm that can perform such approximations within a given error tolerance in a single-layer NN in an optimal way owing to the convexity of the problem formulation. We then use this algorithm and develop a heuristic approach for approximating multi-layer NNs. To give a perspective of the effectiveness of our approach, a 43% reduction in power consumption was obtained with less than 1% accuracy loss on a standard classification problem, with 26% being brought about by the proposed algorithm.
人工神经网络在模式识别和图像分类等任务中得到了广泛的应用。然而,使用传统二进制算术单元的人工神经网络的硬件实现在计算上是昂贵的,能源密集型的,并且有很大的面积开销。随机计算(SC)是一种新兴的范式,它用简单的逻辑电路取代了这些传统的单元,特别适合于容错应用。自旋电子器件,如磁隧道结(MTJs),能够取代CMOS在存储器和逻辑电路。在这项工作中,我们提出了一种高效使用mtj的方法,它表现出概率切换行为,作为随机数字生成器(sng),这构成了我们在SC域中实现神经网络的基础。此外,神经网络的错误弹性目标应用允许我们引入近似计算,这是一个框架,其中计算精度是为了大幅降低功耗而进行权衡的。我们建议在我们基于mtj的神经网络实现中近似突触权重,以MTJ-SNG的特性带来的方式,以实现能源效率。我们设计了一种算法,由于问题公式的凹凸性,该算法可以在给定的误差容限内以最优方式在单层神经网络中执行这种近似。然后,我们使用该算法并开发了一种启发式方法来逼近多层神经网络。为了说明我们方法的有效性,在一个标准分类问题上,功耗降低了43%,准确率损失不到1%,其中提出的算法带来了26%的准确率损失。
{"title":"Power optimizations in MTJ-based Neural Networks through Stochastic Computing","authors":"Ankit Mondal, Ankur Srivastava","doi":"10.1109/ISLPED.2017.8009167","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009167","url":null,"abstract":"Artificial Neural Networks (ANNs) have found widespread applications in tasks such as pattern recognition and image classification. However, hardware implementations of ANNs using conventional binary arithmetic units are computationally expensive, energy-intensive and have large area overheads. Stochastic Computing (SC) is an emerging paradigm which replaces these conventional units with simple logic circuits and is particularly suitable for fault-tolerant applications. Spintronic devices, such as Magnetic Tunnel Junctions (MTJs), are capable of replacing CMOS in memory and logic circuits. In this work, we propose an energy-efficient use of MTJs, which exhibit probabilistic switching behavior, as Stochastic Number Generators (SNGs), which forms the basis of our NN implementation in the SC domain. Further, error resilient target applications of NNs allow us to introduce Approximate Computing, a framework wherein accuracy of computations is traded-off for substantial reductions in power consumption. We propose approximating the synaptic weights in our MTJ-based NN implementation, in ways brought about by properties of our MTJ-SNG, to achieve energy-efficiency. We design an algorithm that can perform such approximations within a given error tolerance in a single-layer NN in an optimal way owing to the convexity of the problem formulation. We then use this algorithm and develop a heuristic approach for approximating multi-layer NNs. To give a perspective of the effectiveness of our approach, a 43% reduction in power consumption was obtained with less than 1% accuracy loss on a standard classification problem, with 26% being brought about by the proposed algorithm.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116047467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Signal strength-aware adaptive offloading for energy efficient mobile devices 高效节能移动设备的信号强度感知自适应卸载
Pub Date : 2017-07-01 DOI: 10.1109/ISLPED.2017.8009182
Young Geun Kim, S. Chung
To prolong battery life of mobile devices, applications often exploit offloading techniques which run computations on remote servers. Unfortunately, the existing offloading techniques do not consider the fact that data transmission time and energy consumption of wireless network interfaces exponentially increase when signal strength decreases. In this paper, we propose an adaptive offloading technique that considers signal strength. Our technique estimates gain (reduced computation time and energy of mobile devices) and loss (increased data transmission time and energy of network interfaces) of offloading depending on signal strength. Based on the estimated gain and loss, our technique determines whether it offloads computations to a server or not. In evaluation, our proposed technique improves performance by 30.1% and saves system-wide energy consumption by 25.0%, on average, compared to the conventional offloading technique that does not consider signal strength.
为了延长移动设备的电池寿命,应用程序经常利用在远程服务器上运行计算的卸载技术。遗憾的是,现有的卸载技术没有考虑到信号强度降低时无线网络接口的数据传输时间和能耗呈指数增长的事实。在本文中,我们提出了一种考虑信号强度的自适应卸载技术。我们的技术根据信号强度估计卸载的增益(减少移动设备的计算时间和能量)和损失(增加网络接口的数据传输时间和能量)。基于估计的增益和损失,我们的技术决定是否将计算卸载到服务器上。在评估中,与不考虑信号强度的传统卸载技术相比,我们提出的技术平均提高了30.1%的性能,节省了25.0%的系统能耗。
{"title":"Signal strength-aware adaptive offloading for energy efficient mobile devices","authors":"Young Geun Kim, S. Chung","doi":"10.1109/ISLPED.2017.8009182","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009182","url":null,"abstract":"To prolong battery life of mobile devices, applications often exploit offloading techniques which run computations on remote servers. Unfortunately, the existing offloading techniques do not consider the fact that data transmission time and energy consumption of wireless network interfaces exponentially increase when signal strength decreases. In this paper, we propose an adaptive offloading technique that considers signal strength. Our technique estimates gain (reduced computation time and energy of mobile devices) and loss (increased data transmission time and energy of network interfaces) of offloading depending on signal strength. Based on the estimated gain and loss, our technique determines whether it offloads computations to a server or not. In evaluation, our proposed technique improves performance by 30.1% and saves system-wide energy consumption by 25.0%, on average, compared to the conventional offloading technique that does not consider signal strength.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115056952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Placement mitigation techniques for power grid electromigration 电网电迁移的安置缓解技术
Pub Date : 2017-07-01 DOI: 10.1109/ISLPED.2017.8009178
Wei Ye, Yibo Lin, Xiaoqing Xu, Wuxi Li, Yiwei Fu, Yongsheng Sun, Canhui Zhan, D. Pan
In advanced technology nodes, power grid metal wires are prone to electromigration (EM) failures due to small wire sizes and high unidirectional current densities. Power grid EM failures usually happen around weak power grid connections delivering current to high power-consuming regions. Previously, power grid EM was mostly addressed at the post-routing stage, which may be too late for a large number of EM violations in modern designs. In this paper, we propose a new set of incremental placement techniques to mitigate power grid EM, including cell move, single row placement, and single tile placement. Experimental results demonstrate the proposed placement techniques can effectively reduce EM violations with negligible wirelength and placement density impacts.
在先进技术节点,电网金属导线由于导线尺寸小、单向电流密度大,容易发生电迁移失效。电网电磁故障通常发生在向高耗能区域输送电流的弱电网连接点附近。在此之前,电网电磁大多是在布线后阶段解决的,对于现代设计中大量的电磁违规来说,这可能为时已晚。在本文中,我们提出了一套新的增量放置技术来缓解电网电磁干扰,包括单元移动、单行放置和单瓦放置。实验结果表明,所提出的放置技术可以有效地减少电磁干扰,而波长和放置密度的影响可以忽略不计。
{"title":"Placement mitigation techniques for power grid electromigration","authors":"Wei Ye, Yibo Lin, Xiaoqing Xu, Wuxi Li, Yiwei Fu, Yongsheng Sun, Canhui Zhan, D. Pan","doi":"10.1109/ISLPED.2017.8009178","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009178","url":null,"abstract":"In advanced technology nodes, power grid metal wires are prone to electromigration (EM) failures due to small wire sizes and high unidirectional current densities. Power grid EM failures usually happen around weak power grid connections delivering current to high power-consuming regions. Previously, power grid EM was mostly addressed at the post-routing stage, which may be too late for a large number of EM violations in modern designs. In this paper, we propose a new set of incremental placement techniques to mitigate power grid EM, including cell move, single row placement, and single tile placement. Experimental results demonstrate the proposed placement techniques can effectively reduce EM violations with negligible wirelength and placement density impacts.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122413192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Secure Human-Internet using dynamic Human Body Communication 使用动态人体通信安全的人类互联网
Pub Date : 2017-07-01 DOI: 10.1109/ISLPED.2017.8009190
Shovan Maity, D. Das, X. Jiang, Shreyas Sen
Continuous miniaturization and cost reduction of unit computing has led to the prolific growth of smart wearable devices. These devices, present on and around the human body, form a complex network known as the Human-Intranet. The Human-Intranet is typically connected through Wireless Body Area Network (WBAN). However, Human Body Communication (HBC) has recently emerged as an energy-efficient and secure alternative that uses the human body as the communication medium. Human-human, human-machine interaction creates dynamic HBC channels, which allow these Human-Intranets to interact with each other forming a Human-Internet. In this paper, we present the concept and demonstration of Secure Human-Internet using dynamic HBC. We highlight important applications of Human-Internet and discuss the architecture of a wearable Human-Internet device capable of communicating through inter-body dynamic HBC. A custom-built hardware prototype is used to demonstrate for the first time information exchange (e.g. business card) during handshaking. Dynamic signal transfer characteristics during inter-body communication through handshake between two individuals wearing such devices are measured and analyzed. The effects of data transmission rate, handshake posture on the HBC based inter-body communication is explored to demonstrate its effectiveness and limitations under varying realistic scenarios. The specific COTS based HBC implementation shows > 8× better energy efficiency compared to the Bluetooth implementation.
单位计算的不断小型化和成本降低导致了智能可穿戴设备的快速增长。这些存在于人体上和周围的设备,形成了一个复杂的网络,称为人体内部网。人类内部网通常通过无线体域网络(WBAN)连接。然而,人体通信(HBC)最近出现了一种节能和安全的替代方案,它使用人体作为通信媒介。人与人、人机交互创建了动态HBC通道,允许这些人-内部网相互交互,形成人-互联网。在本文中,我们提出了使用动态HBC的安全人类互联网的概念和演示。我们强调了人类互联网的重要应用,并讨论了一种可穿戴的人类互联网设备的架构,该设备能够通过体间动态HBC进行通信。一个定制的硬件原型被用来首次演示握手过程中的信息交换(例如名片)。测量并分析了佩戴该装置的两个人通过握手进行体间通信时的动态信号传输特性。探讨了数据传输速率、握手姿势对基于HBC的体间通信的影响,并在不同的现实场景下验证了其有效性和局限性。与蓝牙实现相比,基于COTS的HBC实现的能效提高了80倍。
{"title":"Secure Human-Internet using dynamic Human Body Communication","authors":"Shovan Maity, D. Das, X. Jiang, Shreyas Sen","doi":"10.1109/ISLPED.2017.8009190","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009190","url":null,"abstract":"Continuous miniaturization and cost reduction of unit computing has led to the prolific growth of smart wearable devices. These devices, present on and around the human body, form a complex network known as the Human-Intranet. The Human-Intranet is typically connected through Wireless Body Area Network (WBAN). However, Human Body Communication (HBC) has recently emerged as an energy-efficient and secure alternative that uses the human body as the communication medium. Human-human, human-machine interaction creates dynamic HBC channels, which allow these Human-Intranets to interact with each other forming a Human-Internet. In this paper, we present the concept and demonstration of Secure Human-Internet using dynamic HBC. We highlight important applications of Human-Internet and discuss the architecture of a wearable Human-Internet device capable of communicating through inter-body dynamic HBC. A custom-built hardware prototype is used to demonstrate for the first time information exchange (e.g. business card) during handshaking. Dynamic signal transfer characteristics during inter-body communication through handshake between two individuals wearing such devices are measured and analyzed. The effects of data transmission rate, handshake posture on the HBC based inter-body communication is explored to demonstrate its effectiveness and limitations under varying realistic scenarios. The specific COTS based HBC implementation shows > 8× better energy efficiency compared to the Bluetooth implementation.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128962161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Transistor-level monolithic 3D standard cell layout optimization for full-chip static power integrity 晶体管级单片3D标准单元布局优化,实现全芯片静态电源完整性
Pub Date : 2017-07-01 DOI: 10.1109/ISLPED.2017.8009189
B. W. Ku, Taigon Song, A. Nieuwoudt, S. Lim
Existing transistor-level monolithic 3D (T-M3D) standard cell layouts are based on the folding scheme, in which the pull-down network is simply folded and placed on top of the pull-up network. In this paper, we propose a new layout method, the stitching scheme, targeted towards improved cell performance and power integrity. We perform extensive analysis on each layout scheme and evaluate the timing/power benefits of the stitching scheme. Since the ground and power rails overlap in the T-M3D layouts with the folding scheme, we also present a design methodology for the power delivery network of folding T-M3D ICs to evaluate the impact of the T-M3D cell layout scheme on static power integrity. Compared to 2D ICs at iso-performance, stitching T-M3D ICs show a maximum of 6% power savings, 44% area savings with only 1% more static IR-drop in the 14nm technology node while folding T-M3D ICs undergo serious degradation in static power integrity, causing a reliability issue.
现有的晶体管级单片3D (T-M3D)标准单元布局基于折叠方案,其中下拉网络简单地折叠并放置在上拉网络的顶部。在本文中,我们提出了一种新的布局方法,即拼接方案,旨在提高电池的性能和功率完整性。我们对每种布局方案进行了广泛的分析,并评估了拼接方案的时序/功率优势。由于T-M3D布局中的地面和电源轨与折叠方案重叠,我们还提出了折叠T-M3D ic供电网络的设计方法,以评估T-M3D单元布局方案对静态功率完整性的影响。与同等性能的2D ic相比,拼接T-M3D ic最多可节省6%的功耗,节省44%的面积,在14nm技术节点上仅增加1%的静态ir下降,而折叠T-M3D ic则会严重降低静态功耗完整性,从而导致可靠性问题。
{"title":"Transistor-level monolithic 3D standard cell layout optimization for full-chip static power integrity","authors":"B. W. Ku, Taigon Song, A. Nieuwoudt, S. Lim","doi":"10.1109/ISLPED.2017.8009189","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009189","url":null,"abstract":"Existing transistor-level monolithic 3D (T-M3D) standard cell layouts are based on the folding scheme, in which the pull-down network is simply folded and placed on top of the pull-up network. In this paper, we propose a new layout method, the stitching scheme, targeted towards improved cell performance and power integrity. We perform extensive analysis on each layout scheme and evaluate the timing/power benefits of the stitching scheme. Since the ground and power rails overlap in the T-M3D layouts with the folding scheme, we also present a design methodology for the power delivery network of folding T-M3D ICs to evaluate the impact of the T-M3D cell layout scheme on static power integrity. Compared to 2D ICs at iso-performance, stitching T-M3D ICs show a maximum of 6% power savings, 44% area savings with only 1% more static IR-drop in the 14nm technology node while folding T-M3D ICs undergo serious degradation in static power integrity, causing a reliability issue.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129415883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An energy-efficient and high-throughput bitwise CNN on sneak-path-free digital ReRAM crossbar 一种基于无隐路径数字ReRAM交叉棒的高能效高吞吐量位CNN
Pub Date : 2017-07-01 DOI: 10.1109/ISLPED.2017.8009177
Leibin Ni, Zichuan Liu, Wenhao Song, J. Yang, Hao Yu, Kanwen Wang, Yuangang Wang
Convolutional neural network (CNN) based machine learning requires a highly parallel as well as low power consumption (including leakage power) hardware accelerator. In this paper, we will present a digital ReRAM crossbar based CNN accelerator that can achieve significantly higher throughput and lower power consumption than state-of-arts. The CNN is trained with binary constraints on both weights and activations such that all operations become bitwise. With further use of 1-bit comparator, the bitwise CNN model can be naturally realized on a digital ReRAM-crossbar device. A novel sneak-path-free ReRAM-crossbar is further utilized for large-scale realization. Simulation experiments show that the bitwise CNN accelerator on the digital ReRAM crossbar achieves 98.3% and 91.4% accuracy on MNIST and CIFAR-10 benchmarks, respectively. Moreover, it has a peak throughput of 792GOPS at the power consumption of 6.3mW, which is 18.86 times higher throughput and 44.1 times lower power than CMOS CNN (non-binary) accelerators.
基于卷积神经网络(CNN)的机器学习需要一个高度并行、低功耗(包括泄漏功率)的硬件加速器。在本文中,我们将介绍一种基于数字ReRAM交叉棒的CNN加速器,它可以实现比最先进的更高的吞吐量和更低的功耗。CNN在权重和激活上都使用二进制约束进行训练,这样所有的操作都是按位进行的。通过进一步使用1位比较器,可以在数字ReRAM-crossbar器件上自然地实现按位CNN模型。在此基础上,进一步利用了一种新的无隐路径的ReRAM-crossbar进行大规模实现。仿真实验表明,在MNIST和CIFAR-10基准测试中,基于数字ReRAM交叉棒的按位CNN加速器的准确率分别达到98.3%和91.4%。在6.3mW的功耗下,其峰值吞吐量为792GOPS,比CMOS CNN(非二进制)加速器高18.86倍,功耗低44.1倍。
{"title":"An energy-efficient and high-throughput bitwise CNN on sneak-path-free digital ReRAM crossbar","authors":"Leibin Ni, Zichuan Liu, Wenhao Song, J. Yang, Hao Yu, Kanwen Wang, Yuangang Wang","doi":"10.1109/ISLPED.2017.8009177","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009177","url":null,"abstract":"Convolutional neural network (CNN) based machine learning requires a highly parallel as well as low power consumption (including leakage power) hardware accelerator. In this paper, we will present a digital ReRAM crossbar based CNN accelerator that can achieve significantly higher throughput and lower power consumption than state-of-arts. The CNN is trained with binary constraints on both weights and activations such that all operations become bitwise. With further use of 1-bit comparator, the bitwise CNN model can be naturally realized on a digital ReRAM-crossbar device. A novel sneak-path-free ReRAM-crossbar is further utilized for large-scale realization. Simulation experiments show that the bitwise CNN accelerator on the digital ReRAM crossbar achieves 98.3% and 91.4% accuracy on MNIST and CIFAR-10 benchmarks, respectively. Moreover, it has a peak throughput of 792GOPS at the power consumption of 6.3mW, which is 18.86 times higher throughput and 44.1 times lower power than CMOS CNN (non-binary) accelerators.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133236908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Gabor filter assisted energy efficient fast learning Convolutional Neural Networks Gabor滤波器辅助高效快速学习卷积神经网络
Pub Date : 2017-05-12 DOI: 10.1109/ISLPED.2017.8009202
Syed Shakib Sarwar, P. Panda, K. Roy
Convolutional Neural Networks (CNN) are being increasingly used in computer vision for a wide range of classification and recognition problems. However, training these large networks demands high computational time and energy requirements; hence, their energy-efficient implementation is of great interest. In this work, we reduce the training complexity of CNNs by replacing certain weight kernels of a CNN with Gabor filters. The convolutional layers use the Gabor filters as fixed weight kernels, which extracts intrinsic features, with regular trainable weight kernels. This combination creates a balanced system that gives better training performance in terms of energy and time, compared to the standalone CNN (without any Gabor kernels), in exchange for tolerable accuracy degradation. We show that the accuracy degradation can be mitigated by partially training the Gabor kernels, for a small fraction of the total training cycles. We evaluated the proposed approach on 4 benchmark applications. Simple tasks like face detection and character recognition (MNIST and TiCH), were implemented using LeNet architecture. While a more complex task of objet recognition (CIFAR10) was implemented on a state-of-the-art deep CNN (Network in Network) architecture. The proposed approach yields 1.31–1.53× improvement in training energy in comparison to conventional CNN implementation. We also obtain improvement up to 1.4× in training time, up to 2.23× in storage requirements, and up to 2.2× in memory access energy. The accuracy degradation suffered by the approximate implementations is within 0– 3% of the baseline.
卷积神经网络(CNN)在计算机视觉中被越来越多地用于广泛的分类和识别问题。然而,训练这些大型网络需要大量的计算时间和能量;因此,它们的节能实施是非常有趣的。在这项工作中,我们通过用Gabor滤波器替换CNN的某些权重核来降低CNN的训练复杂度。卷积层使用Gabor滤波器作为固定权核,提取具有规则可训练权核的内在特征。与独立的CNN(没有任何Gabor内核)相比,这种组合创造了一个平衡的系统,在能量和时间方面提供了更好的训练性能,以换取可容忍的精度下降。我们表明,对于总训练周期的一小部分,可以通过部分训练Gabor核来减轻精度下降。我们在4个基准应用程序上评估了所提出的方法。简单的任务,如人脸检测和字符识别(MNIST和TiCH),使用LeNet架构实现。而更复杂的目标识别任务(CIFAR10)是在最先进的深度CNN(网络中的网络)架构上实现的。与传统的CNN实现相比,该方法的训练能量提高了1.31 - 1.53倍。我们的训练时间提高了1.4倍,存储需求提高了2.23倍,内存访问能量提高了2.2倍。近似实现的精度下降幅度在基线的0 - 3%以内。
{"title":"Gabor filter assisted energy efficient fast learning Convolutional Neural Networks","authors":"Syed Shakib Sarwar, P. Panda, K. Roy","doi":"10.1109/ISLPED.2017.8009202","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009202","url":null,"abstract":"Convolutional Neural Networks (CNN) are being increasingly used in computer vision for a wide range of classification and recognition problems. However, training these large networks demands high computational time and energy requirements; hence, their energy-efficient implementation is of great interest. In this work, we reduce the training complexity of CNNs by replacing certain weight kernels of a CNN with Gabor filters. The convolutional layers use the Gabor filters as fixed weight kernels, which extracts intrinsic features, with regular trainable weight kernels. This combination creates a balanced system that gives better training performance in terms of energy and time, compared to the standalone CNN (without any Gabor kernels), in exchange for tolerable accuracy degradation. We show that the accuracy degradation can be mitigated by partially training the Gabor kernels, for a small fraction of the total training cycles. We evaluated the proposed approach on 4 benchmark applications. Simple tasks like face detection and character recognition (MNIST and TiCH), were implemented using LeNet architecture. While a more complex task of objet recognition (CIFAR10) was implemented on a state-of-the-art deep CNN (Network in Network) architecture. The proposed approach yields 1.31–1.53× improvement in training energy in comparison to conventional CNN implementation. We also obtain improvement up to 1.4× in training time, up to 2.23× in storage requirements, and up to 2.2× in memory access energy. The accuracy degradation suffered by the approximate implementations is within 0– 3% of the baseline.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114963562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 92
期刊
2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1