首页 > 最新文献

2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)最新文献

英文 中文
CORAL: Coarse-grained reconfigurable architecture for Convolutional Neural Networks CORAL:卷积神经网络的粗粒度可重构架构
Pub Date : 2017-07-24 DOI: 10.1109/ISLPED.2017.8009162
Zhe Yuan, Yongpan Liu, Jinshan Yue, Jinyang Li, Huazhong Yang
Convolutional Neural Network (CNN) has become one of the most successful technologies for visual classification and other applications. As CNN models continue to evolve and adopt different kernel sizes in various applications, it is necessary for the hardware architecture to support reconfigurability. Previous FPGAs and programmable ASICs are fine-grained reconfigurable but with energy efficiency compromise. Considering specific features of CNNs, this paper presents an energy efficient coarse-grained reconfigurable architecture, denoted as CORAL. An application-specific configuration neural block is proposed for convolution operations with reconfigurable data quantization to reduce both energy consumption and on-chip memory requirements. An optimal data loading strategy is presented for CORAL to achieve the best energy efficiency. Experimental results show that CORAL improves 80.0% energy efficiency while reduces 78.9% chip area and 81.0% reconfiguration time compared with the best up-to-date programmable ASIC solution.
卷积神经网络(CNN)已经成为视觉分类和其他应用中最成功的技术之一。随着CNN模型的不断发展和在各种应用中采用不同的内核大小,硬件架构必须支持可重构性。以前的fpga和可编程asic是细粒度可重构的,但在能效方面有所妥协。针对cnn的具体特点,提出了一种节能的粗粒度可重构架构,称为CORAL。提出了一种特定应用的配置神经块,用于具有可重构数据量化的卷积运算,以降低能耗和片上存储需求。提出了一种优化的数据加载策略,以达到最佳的能源效率。实验结果表明,与目前最好的可编程ASIC解决方案相比,CORAL提高了80.0%的能效,减少了78.9%的芯片面积和81.0%的重构时间。
{"title":"CORAL: Coarse-grained reconfigurable architecture for Convolutional Neural Networks","authors":"Zhe Yuan, Yongpan Liu, Jinshan Yue, Jinyang Li, Huazhong Yang","doi":"10.1109/ISLPED.2017.8009162","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009162","url":null,"abstract":"Convolutional Neural Network (CNN) has become one of the most successful technologies for visual classification and other applications. As CNN models continue to evolve and adopt different kernel sizes in various applications, it is necessary for the hardware architecture to support reconfigurability. Previous FPGAs and programmable ASICs are fine-grained reconfigurable but with energy efficiency compromise. Considering specific features of CNNs, this paper presents an energy efficient coarse-grained reconfigurable architecture, denoted as CORAL. An application-specific configuration neural block is proposed for convolution operations with reconfigurable data quantization to reduce both energy consumption and on-chip memory requirements. An optimal data loading strategy is presented for CORAL to achieve the best energy efficiency. Experimental results show that CORAL improves 80.0% energy efficiency while reduces 78.9% chip area and 81.0% reconfiguration time compared with the best up-to-date programmable ASIC solution.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129853760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Monolithic 3D IC designs for low-power deep neural networks targeting speech recognition 针对语音识别的低功耗深度神经网络的单片3D集成电路设计
Pub Date : 2017-07-24 DOI: 10.1109/ISLPED.2017.8009175
Kyungwook Chang, Deepak Kadetotad, Yu Cao, Jae-sun Seo, S. Lim
In recent years, deep learning has become widespread for various real-world recognition tasks. In addition to recognition accuracy, energy efficiency is another grand challenge to enable local intelligence in edge devices. In this paper, we investigate the adoption of monolithic 3D IC (M3D) technology for deep learning hardware design, using speech recognition as a test vehicle. M3D has recently proven to be one of the leading contenders to address the power, performance and area (PPA) scaling challenges in advanced technology nodes. Our study encompasses the influence of key parameters in DNN hardware implementations towards energy efficiency, including DNN architectural choices, underlying workloads, and tier partitioning choices in M3D. Our post-layout M3D designs, together with hardware-efficient sparse algorithms, produce power savings beyond what can be achieved using conventional 2D ICs. Experimental results show that M3D offers 22.3% iso-performance power saving, convincingly demonstrating its entitlement as a solution for DNN ASICs. We further present architectural guidelines for M3D DNNs to maximize the power saving.
近年来,深度学习已经广泛应用于各种现实世界的识别任务。除了识别准确性之外,能源效率是在边缘设备中实现本地智能的另一个重大挑战。在本文中,我们研究了采用单片3D集成电路(M3D)技术进行深度学习硬件设计,并以语音识别作为测试工具。M3D最近被证明是解决先进技术节点中功率、性能和面积(PPA)扩展挑战的领先竞争者之一。我们的研究涵盖了DNN硬件实现中关键参数对能源效率的影响,包括DNN架构选择、底层工作负载和M3D中的层划分选择。我们的布局后M3D设计,加上硬件高效的稀疏算法,产生的功耗节省超过了使用传统2D ic所能实现的。实验结果表明,M3D可提供22.3%的等性能节能,令人信服地证明了其作为DNN asic解决方案的权利。我们进一步提出了M3D dnn的架构指南,以最大限度地节省电力。
{"title":"Monolithic 3D IC designs for low-power deep neural networks targeting speech recognition","authors":"Kyungwook Chang, Deepak Kadetotad, Yu Cao, Jae-sun Seo, S. Lim","doi":"10.1109/ISLPED.2017.8009175","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009175","url":null,"abstract":"In recent years, deep learning has become widespread for various real-world recognition tasks. In addition to recognition accuracy, energy efficiency is another grand challenge to enable local intelligence in edge devices. In this paper, we investigate the adoption of monolithic 3D IC (M3D) technology for deep learning hardware design, using speech recognition as a test vehicle. M3D has recently proven to be one of the leading contenders to address the power, performance and area (PPA) scaling challenges in advanced technology nodes. Our study encompasses the influence of key parameters in DNN hardware implementations towards energy efficiency, including DNN architectural choices, underlying workloads, and tier partitioning choices in M3D. Our post-layout M3D designs, together with hardware-efficient sparse algorithms, produce power savings beyond what can be achieved using conventional 2D ICs. Experimental results show that M3D offers 22.3% iso-performance power saving, convincingly demonstrating its entitlement as a solution for DNN ASICs. We further present architectural guidelines for M3D DNNs to maximize the power saving.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121648969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A 32nm, 0.65–10GHz, 0.9/0.3 ps/σ TX/RX jitter single inductor digital fractional-n clock generator for reconfigurable serial I/O 32nm, 0.65-10GHz, 0.9/0.3 ps/σ TX/RX抖动单电感数字分数n时钟发生器,用于可重构串行I/O
Pub Date : 2017-07-24 DOI: 10.1109/ISLPED.2017.8009160
William Y. Li, Hyung Seok Kim, K. Chandrashekar, K. M. Nguyen, A. Ravi
In CPU, SOC, GPU, and PC-on-chip, I/O power consumption can be significant. To improve power efficiency, I/O bundles in group of 4, 8, or 16b, should scale their data rate according to the application requirements. However, clocking architecture imposes significant challenges to support different data rate simultaneously. In high bandwidth I/O, LC oscillators are preferred for low jitter, but the limited frequency range confines the data rate tuning. Multiple LC-PLLs are costly in area and power, and sometimes infeasible due to heavily congested I/O area. Worse still, couplings between inductors could lead to PLL pulling closing the sampling eye. In this paper, a reconfigurable 0.65–10GHz digital fractional-n clock generator using a single LC PLL, calibrated 0.75/1.25/1.75 digital fractional post dividers for serial I/O is presented. The architecture enables I/O driven by the same PLL to operate at different data rate, thereby reducing power. In addition, multiple LC-PLLs are replaced by one saving area, power, and coupling between LC oscillators. The PLL incorporates a staggered varactor, wide-tuning VCO, and a hysteretic redundant frequency acquisition for improved temperature stability. The prototype in a 32nm high-k metal gate process has a measured TX/RX jitter of 0.9/0.3 ps/σ and dissipates 36.2mW from 1.05V supply.
在CPU、SOC、GPU和片上pc中,I/O功耗可能非常大。为了提高功率效率,分组为4、8或16b的I/O束应该根据应用程序需求调整其数据速率。然而,时钟体系结构对同时支持不同的数据速率提出了重大挑战。在高带宽I/O中,LC振荡器优先用于低抖动,但有限的频率范围限制了数据速率调谐。多个lc - pll在面积和功率上都很昂贵,有时由于I/O面积严重拥挤而不可行。更糟糕的是,电感之间的耦合可能导致锁相环拉紧采样眼。本文提出了一种可重构的0.65-10GHz数字分数n时钟发生器,采用单LC锁相环,校准了串行I/O的0.75/1.25/1.75数字分数后分频器。该架构允许由相同锁相环驱动的I/O以不同的数据速率工作,从而降低功耗。此外,多个LC- pll被一个节省面积,功率和LC振荡器之间耦合的LC- pll所取代。锁相环采用交错变容管、宽调谐压控振荡器和滞回冗余频率采集,以提高温度稳定性。在32nm高k金属栅极工艺中,该原型在1.05V电源下的TX/RX抖动测量值为0.9/0.3 ps/σ,功耗为36.2mW。
{"title":"A 32nm, 0.65–10GHz, 0.9/0.3 ps/σ TX/RX jitter single inductor digital fractional-n clock generator for reconfigurable serial I/O","authors":"William Y. Li, Hyung Seok Kim, K. Chandrashekar, K. M. Nguyen, A. Ravi","doi":"10.1109/ISLPED.2017.8009160","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009160","url":null,"abstract":"In CPU, SOC, GPU, and PC-on-chip, I/O power consumption can be significant. To improve power efficiency, I/O bundles in group of 4, 8, or 16b, should scale their data rate according to the application requirements. However, clocking architecture imposes significant challenges to support different data rate simultaneously. In high bandwidth I/O, LC oscillators are preferred for low jitter, but the limited frequency range confines the data rate tuning. Multiple LC-PLLs are costly in area and power, and sometimes infeasible due to heavily congested I/O area. Worse still, couplings between inductors could lead to PLL pulling closing the sampling eye. In this paper, a reconfigurable 0.65–10GHz digital fractional-n clock generator using a single LC PLL, calibrated 0.75/1.25/1.75 digital fractional post dividers for serial I/O is presented. The architecture enables I/O driven by the same PLL to operate at different data rate, thereby reducing power. In addition, multiple LC-PLLs are replaced by one saving area, power, and coupling between LC oscillators. The PLL incorporates a staggered varactor, wide-tuning VCO, and a hysteretic redundant frequency acquisition for improved temperature stability. The prototype in a 32nm high-k metal gate process has a measured TX/RX jitter of 0.9/0.3 ps/σ and dissipates 36.2mW from 1.05V supply.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121561117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring sparsity of firing activities and clock gating for energy-efficient recurrent spiking neural processors 高能效循环尖峰神经处理器的发射活动稀疏性与时钟门控研究
Pub Date : 2017-07-24 DOI: 10.1109/ISLPED.2017.8009197
Yu Liu, Yingyezhe Jin, Peng Li
As a model of recurrent spiking neural networks, the Liquid State Machine (LSM) offers a powerful brain-inspired computing platform for pattern recognition and machine learning applications. While operated by processing neural spiking activities, the LSM naturally lends itself to an efficient hardware implementation via exploration of typical sparse firing patterns emerged from the recurrent neural network and smart processing of computational tasks that are orchestrated by different firing events at runtime. We explore these opportunities by presenting a LSM processor architecture with integrated on-chip learning and its FPGA implementation. Our LSM processor leverage the sparsity of firing activities to allow for efficient event-driven processing and activity-dependent clock gating. Using the spoken English letters adopted from the TI46 [1] speech recognition corpus as a benchmark, we show that the proposed FPGA-based neural processor system is up to 29% more energy efficient than a baseline LSM processor with little extra hardware overhead.
作为循环脉冲神经网络的一种模型,液态机(LSM)为模式识别和机器学习应用提供了一个强大的脑启发计算平台。虽然LSM是通过处理神经尖峰活动来操作的,但它通过探索从循环神经网络中出现的典型稀疏触发模式,以及对运行时由不同触发事件编排的计算任务的智能处理,自然地使自己成为高效的硬件实现。我们通过提出具有集成片上学习及其FPGA实现的LSM处理器架构来探索这些机会。我们的LSM处理器利用触发活动的稀疏性来实现高效的事件驱动处理和依赖于活动的时钟门控。使用TI46[1]语音识别语料库中采用的英语口语字母作为基准,我们表明,所提出的基于fpga的神经处理器系统比基线LSM处理器节能29%,并且几乎没有额外的硬件开销。
{"title":"Exploring sparsity of firing activities and clock gating for energy-efficient recurrent spiking neural processors","authors":"Yu Liu, Yingyezhe Jin, Peng Li","doi":"10.1109/ISLPED.2017.8009197","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009197","url":null,"abstract":"As a model of recurrent spiking neural networks, the Liquid State Machine (LSM) offers a powerful brain-inspired computing platform for pattern recognition and machine learning applications. While operated by processing neural spiking activities, the LSM naturally lends itself to an efficient hardware implementation via exploration of typical sparse firing patterns emerged from the recurrent neural network and smart processing of computational tasks that are orchestrated by different firing events at runtime. We explore these opportunities by presenting a LSM processor architecture with integrated on-chip learning and its FPGA implementation. Our LSM processor leverage the sparsity of firing activities to allow for efficient event-driven processing and activity-dependent clock gating. Using the spoken English letters adopted from the TI46 [1] speech recognition corpus as a benchmark, we show that the proposed FPGA-based neural processor system is up to 29% more energy efficient than a baseline LSM processor with little extra hardware overhead.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134218173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A tunable Ultra Low Power inductorless Low Noise Amplifier exploiting body biasing of 28 nm FDSOI technology 一种利用28纳米FDSOI技术体偏置的可调谐超低功率无电感低噪声放大器
Pub Date : 2017-07-24 DOI: 10.1109/ISLPED.2017.8009161
J. Zaini, F. Hameau, T. Taris, D. Morche, P. Audebert, E. Mercier
This paper presents the design of an Ultra Low Power (ULP) inductorless Low Noise Amplifier (LNA) based on a Common Gate (CG) architecture using the back gate control of the Fully-Depleted Silicon-On-Insulator (FDSOI) technology. It demonstrates the potential of the back biasing to lower the power consumption of more than 30 % compared to a design without back biasing, while keeping similar performance. This paper also shows the possibility with the back gate control of this technology to reach additional performance, suitable for the design of tunable LNAs. The proposed LNA has been implemented in ST-Microelectronic 28 nm FDSOI Technology and its active area is only 0.0015 mm2. The measured performance exhibit more than 16 dB of voltage Gain (Gv), 7.3 dB of Noise Figure (NF) and an Input referred third-order Intercept Point (IIP3) of −16 dBm. The total power consumption is 300 µW from a 0.6 V supply. The same LNA reached other performance modes at constant Figure of Merit (FoM).
本文提出了一种基于共门(CG)结构的超低功耗(ULP)无电感低噪声放大器(LNA)的设计,该放大器采用全耗尽绝缘体上硅(FDSOI)技术的后门控制。它展示了后偏置的潜力,与没有后偏置的设计相比,在保持类似性能的情况下,将功耗降低30%以上。本文还展示了该技术与后门控制的可能性,以达到额外的性能,适用于可调谐LNAs的设计。所提出的LNA已在st微电子28 nm FDSOI技术上实现,其有源面积仅为0.0015 mm2。测量的性能表现出超过16 dB的电压增益(Gv), 7.3 dB的噪声系数(NF)和- 16 dBm的输入参考三阶截点(IIP3)。在0.6 V电源下,总功耗为300µW。相同的LNA在恒定的质量系数(FoM)下达到其他性能模式。
{"title":"A tunable Ultra Low Power inductorless Low Noise Amplifier exploiting body biasing of 28 nm FDSOI technology","authors":"J. Zaini, F. Hameau, T. Taris, D. Morche, P. Audebert, E. Mercier","doi":"10.1109/ISLPED.2017.8009161","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009161","url":null,"abstract":"This paper presents the design of an Ultra Low Power (ULP) inductorless Low Noise Amplifier (LNA) based on a Common Gate (CG) architecture using the back gate control of the Fully-Depleted Silicon-On-Insulator (FDSOI) technology. It demonstrates the potential of the back biasing to lower the power consumption of more than 30 % compared to a design without back biasing, while keeping similar performance. This paper also shows the possibility with the back gate control of this technology to reach additional performance, suitable for the design of tunable LNAs. The proposed LNA has been implemented in ST-Microelectronic 28 nm FDSOI Technology and its active area is only 0.0015 mm2. The measured performance exhibit more than 16 dB of voltage Gain (Gv), 7.3 dB of Noise Figure (NF) and an Input referred third-order Intercept Point (IIP3) of −16 dBm. The total power consumption is 300 µW from a 0.6 V supply. The same LNA reached other performance modes at constant Figure of Merit (FoM).","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126931346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Frequency governors for cloud database OLTP workloads 用于云数据库OLTP工作负载的频率调控器
Pub Date : 2017-07-24 DOI: 10.1109/ISLPED.2017.8009183
Rathijit Sen, A. Halverson
Dynamically controlling processor frequency to save power while meeting customer Service-Level Objectives (SLOs) can reduce the cost of goods sold for cloud service providers. However, resource governance for Online Transaction Processing (OLTP) workloads in the cloud is complicated by throughput constraints, latency constraints, shallow sleep states that lower processor utilization, and (often) isolation of applications from hardware resource governors. This paper demonstrates a novel frequency governor that improves upon existing Intel P-state and Cpufreq governors in saving power for a cloud OLTP benchmark on Microsoft SQL Server for Linux.
动态控制处理器频率以节省电力,同时满足客户服务水平目标(slo),可以降低云服务提供商销售商品的成本。然而,云中联机事务处理(OLTP)工作负载的资源治理由于吞吐量约束、延迟约束、降低处理器利用率的浅睡眠状态以及(通常)应用程序与硬件资源调控器的隔离而变得复杂。本文演示了一种新的频率调控器,它改进了现有的Intel P-state和Cpufreq调控器,为Microsoft SQL Server for Linux上的云OLTP基准测试节省了电力。
{"title":"Frequency governors for cloud database OLTP workloads","authors":"Rathijit Sen, A. Halverson","doi":"10.1109/ISLPED.2017.8009183","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009183","url":null,"abstract":"Dynamically controlling processor frequency to save power while meeting customer Service-Level Objectives (SLOs) can reduce the cost of goods sold for cloud service providers. However, resource governance for Online Transaction Processing (OLTP) workloads in the cloud is complicated by throughput constraints, latency constraints, shallow sleep states that lower processor utilization, and (often) isolation of applications from hardware resource governors. This paper demonstrates a novel frequency governor that improves upon existing Intel P-state and Cpufreq governors in saving power for a cloud OLTP benchmark on Microsoft SQL Server for Linux.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116993277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Hotspot monitoring and Temperature Estimation with miniature on-chip temperature sensors 用微型片上温度传感器进行热点监测和温度估计
Pub Date : 2017-07-24 DOI: 10.1109/ISLPED.2017.8009191
P. K. Chundi, Yini Zhou, Martha A. Kim, E. Kursun, Mingoo Seok
This paper presents analysis and evaluation of the impact of size and voltage scalability of on-chip temperature sensor on the accuracy of hotspot monitoring and temperature estimation in dynamic thermal management of high performance microprocessors. The analysis is based on both the layout level and the system level across state-of-the-art sensors in terms of accuracy, voltage-scalability, and silicon footprint. Our analysis shows that a sensor having compact footprint and good voltage scalability can be placed on exact hotspot locations, typically among digital cells, significantly improving accuracy in tracking hotspots and estimating temperature of microarchitecture blocks, as compared to two other sensors that have higher sensor-circuit accuracy, large footprint and little voltage scalability limiting flexible placement.
本文分析和评价了片上温度传感器的尺寸和电压可扩展性对高性能微处理器动态热管理中热点监测和温度估计精度的影响。该分析基于最先进传感器的布局级别和系统级别,包括精度、电压可扩展性和硅足迹。我们的分析表明,与其他两种传感器相比,具有紧凑的占地面积和良好的电压可扩展性的传感器可以放置在精确的热点位置,通常在数字单元之间,显著提高了跟踪热点和估计微架构块温度的准确性,而其他两种传感器具有更高的传感器电路精度,占地面积大,电压可扩展性小,限制了灵活的放置。
{"title":"Hotspot monitoring and Temperature Estimation with miniature on-chip temperature sensors","authors":"P. K. Chundi, Yini Zhou, Martha A. Kim, E. Kursun, Mingoo Seok","doi":"10.1109/ISLPED.2017.8009191","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009191","url":null,"abstract":"This paper presents analysis and evaluation of the impact of size and voltage scalability of on-chip temperature sensor on the accuracy of hotspot monitoring and temperature estimation in dynamic thermal management of high performance microprocessors. The analysis is based on both the layout level and the system level across state-of-the-art sensors in terms of accuracy, voltage-scalability, and silicon footprint. Our analysis shows that a sensor having compact footprint and good voltage scalability can be placed on exact hotspot locations, typically among digital cells, significantly improving accuracy in tracking hotspots and estimating temperature of microarchitecture blocks, as compared to two other sensors that have higher sensor-circuit accuracy, large footprint and little voltage scalability limiting flexible placement.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128363420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Write-energy-saving ReRAM-based nonvolatile SRAM with redundant bit-write-aware controller for last-level caches 具有冗余位写感知控制器的基于读写节能的非易失SRAM
Pub Date : 2017-07-24 DOI: 10.1109/ISLPED.2017.8009153
Tsai-Kan Chien, L. Chiou, Yi-Sung Tsou, S. Sheu, Pei-Hua Wang, M. Tsai, Chih-I Wu
Nonvolatile static random-access memory (NV-SRAM) is a crucial component type for normally-off computing systems. This work proposes a novel 10T2R resistive random-access memory (ReRAM)-based NV-SRAM controller that is aware of redundant bit writes and considers the conditions of redundant bit writes. When data stored in SRAM cells are the same as the data in ReRAM devices, backup can be skipped. Otherwise, backup is performed. As a result, redundant bit-write conditions indicate that energy can be saved when backing up data. Simulations show that as much as 93% of typical energy requirements can be saved when the high resistive state is larger than 10 MΩ. As long as the probability of redundant bit writes is larger than 25%, backup energy saving can be achieved. The ReRAM chip is manufactured with 90 nm CMOS technology and the ReRAM process of the Industrial Technology Research Institute. This design can be applied to L2 and L3 caches.
非易失性静态随机存取存储器(NV-SRAM)是正常关闭计算系统的关键部件类型。这项工作提出了一种新的基于10T2R电阻随机存取存储器(ReRAM)的NV-SRAM控制器,该控制器可以感知冗余位写入并考虑冗余位写入的条件。当SRAM单元中存储的数据与ReRAM设备中的数据相同时,可以跳过备份。否则执行备份。因此,冗余写位条件表明在备份数据时可以节省能量。仿真表明,当高阻态大于10 MΩ时,可节省高达93%的典型能量需求。只要冗余位写概率大于25%,就可以实现备份节能。ReRAM芯片是采用90纳米CMOS技术和产业技术研究院的ReRAM工艺制造的。这种设计可以应用于L2和L3缓存。
{"title":"Write-energy-saving ReRAM-based nonvolatile SRAM with redundant bit-write-aware controller for last-level caches","authors":"Tsai-Kan Chien, L. Chiou, Yi-Sung Tsou, S. Sheu, Pei-Hua Wang, M. Tsai, Chih-I Wu","doi":"10.1109/ISLPED.2017.8009153","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009153","url":null,"abstract":"Nonvolatile static random-access memory (NV-SRAM) is a crucial component type for normally-off computing systems. This work proposes a novel 10T2R resistive random-access memory (ReRAM)-based NV-SRAM controller that is aware of redundant bit writes and considers the conditions of redundant bit writes. When data stored in SRAM cells are the same as the data in ReRAM devices, backup can be skipped. Otherwise, backup is performed. As a result, redundant bit-write conditions indicate that energy can be saved when backing up data. Simulations show that as much as 93% of typical energy requirements can be saved when the high resistive state is larger than 10 MΩ. As long as the probability of redundant bit writes is larger than 25%, backup energy saving can be achieved. The ReRAM chip is manufactured with 90 nm CMOS technology and the ReRAM process of the Industrial Technology Research Institute. This design can be applied to L2 and L3 caches.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123219928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
SENIN: An energy-efficient sparse neuromorphic system with on-chip learning 具有片上学习功能的高效稀疏神经形态系统
Pub Date : 2017-07-24 DOI: 10.1109/ISLPED.2017.8009174
Myung-Hoon Choi, Seungkyu Choi, Jaehyeong Sim, L. Kim
Applying highly accurate neural networks to mobile devices encounters energy problems in battery-limited mobile environments. To resolve these problems, neuromorphic hardware solutions that enable event-driven operation have been proposed. In this work, we present a novel sparse neuromorphic system that implements an E-I Net algorithm to further improve energy efficiency. We introduce a neuron clock-gating technique that significantly reduces energy consumption by predicting future neuron spike activity without any loss of accuracy. We also propose synaptic pruning to save additional energy with minimal impact on classification accuracy. For fast adaptation to a changing environment, a learning algorithm is implemented in the proposed system. Compared to prior studies, our experimental results illustrate that the proposed system achieves 5.3×–11.4× energy efficiency improvement with comparable accuracy.
在电池有限的移动环境中,将高度精确的神经网络应用于移动设备会遇到能量问题。为了解决这些问题,已经提出了支持事件驱动操作的神经形态硬件解决方案。在这项工作中,我们提出了一种新的稀疏神经形态系统,该系统实现了E-I网算法,以进一步提高能源效率。我们介绍了一种神经元时钟门控技术,通过预测未来神经元尖峰活动而不损失任何准确性,显著降低了能量消耗。我们还提出了突触修剪,以节省额外的能量,对分类精度的影响最小。为了快速适应不断变化的环境,在系统中实现了一种学习算法。与先前的研究相比,我们的实验结果表明,所提出的系统在相当的精度下实现了5.3×-11.4×能源效率的提高。
{"title":"SENIN: An energy-efficient sparse neuromorphic system with on-chip learning","authors":"Myung-Hoon Choi, Seungkyu Choi, Jaehyeong Sim, L. Kim","doi":"10.1109/ISLPED.2017.8009174","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009174","url":null,"abstract":"Applying highly accurate neural networks to mobile devices encounters energy problems in battery-limited mobile environments. To resolve these problems, neuromorphic hardware solutions that enable event-driven operation have been proposed. In this work, we present a novel sparse neuromorphic system that implements an E-I Net algorithm to further improve energy efficiency. We introduce a neuron clock-gating technique that significantly reduces energy consumption by predicting future neuron spike activity without any loss of accuracy. We also propose synaptic pruning to save additional energy with minimal impact on classification accuracy. For fast adaptation to a changing environment, a learning algorithm is implemented in the proposed system. Compared to prior studies, our experimental results illustrate that the proposed system achieves 5.3×–11.4× energy efficiency improvement with comparable accuracy.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132250514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Online tuning of Dynamic Power Management for efficient execution of interactive workloads 动态电源管理的在线调优,以有效地执行交互式工作负载
Pub Date : 2017-07-01 DOI: 10.1109/ISLPED.2017.8009195
James R. B. Bantock, V. Tenentes, B. Al-Hashimi, G. Merrett
Modern mobile devices contain powerful Multi-Processor System-on-Chips (MPSoCs) that are performance throttled by Dynamic Power Management (DPM) runtime systems to extend battery lifetime. Applications on mobile devices commonly generate highly interactive workloads, dependent on interaction between the processor cores, peripherals, external resources and the user, such as touch input during web-browsing. Inevitably, a subset of interactive workloads are affected by delays caused by data unavailability, e.g. loss or delay of data packets during voice-over-IP. At the same time, the system is required to respond quickly upon data retrieval to ensure that the user Quality of Experience (QoE) metrics (frame-rate, latency, etc.) are not degraded. Traditionally, operating systems have mitigated this problem with periodic sampling or event-driven approaches. Through experimentation using a mobile MPSoC platform, however, we demonstrate that improving the tuning of DPM parameters for certain interactive user inputs can provide energy savings of up to 21% or QoE improvements of up to 36%, when compared with the traditional approach. To capture these improvements, we propose a dynamic modeling of user input and data resource access times (e.g. mobile network bandwidth and latency) for interactive workloads, which is based on workload profiling and which we refer to herein as inelasticity analysis. The proposed approach is implemented through online tuning of a DPM runtime in the Android operating system and is validated through a Monte Carlo simulation of interactive workloads. In comparison to the default DPM tuning, the proposed approach achieves energy savings of 13% or QoE improvement of 27% or a selectable trade-off, e.g. 9% energy savings and 15% QoE improvement.
现代移动设备包含功能强大的多处理器片上系统(mpsoc),其性能由动态电源管理(DPM)运行时系统控制,以延长电池寿命。移动设备上的应用程序通常会产生高度交互的工作负载,依赖于处理器核心、外设、外部资源和用户之间的交互,例如在浏览网页时的触摸输入。不可避免地,一部分交互工作负载会受到数据不可用造成的延迟的影响,例如,在ip语音期间数据包的丢失或延迟。同时,系统需要在数据检索时快速响应,以确保用户体验质量(QoE)指标(帧率,延迟等)不会降低。传统上,操作系统通过定期采样或事件驱动的方法缓解了这个问题。然而,通过使用移动MPSoC平台的实验,我们证明,与传统方法相比,改进某些交互式用户输入的DPM参数的调整可以提供高达21%的节能或高达36%的QoE改进。为了获得这些改进,我们提出了交互式工作负载的用户输入和数据资源访问时间(例如移动网络带宽和延迟)的动态建模,这是基于工作负载分析的,我们在此将其称为非弹性分析。该方法通过在Android操作系统中在线调优DPM运行时来实现,并通过交互式工作负载的蒙特卡罗模拟进行了验证。与默认的DPM调优相比,建议的方法实现了13%的能源节约或27%的QoE改进,或者是一个可选择的权衡,例如9%的能源节约和15%的QoE改进。
{"title":"Online tuning of Dynamic Power Management for efficient execution of interactive workloads","authors":"James R. B. Bantock, V. Tenentes, B. Al-Hashimi, G. Merrett","doi":"10.1109/ISLPED.2017.8009195","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009195","url":null,"abstract":"Modern mobile devices contain powerful Multi-Processor System-on-Chips (MPSoCs) that are performance throttled by Dynamic Power Management (DPM) runtime systems to extend battery lifetime. Applications on mobile devices commonly generate highly interactive workloads, dependent on interaction between the processor cores, peripherals, external resources and the user, such as touch input during web-browsing. Inevitably, a subset of interactive workloads are affected by delays caused by data unavailability, e.g. loss or delay of data packets during voice-over-IP. At the same time, the system is required to respond quickly upon data retrieval to ensure that the user Quality of Experience (QoE) metrics (frame-rate, latency, etc.) are not degraded. Traditionally, operating systems have mitigated this problem with periodic sampling or event-driven approaches. Through experimentation using a mobile MPSoC platform, however, we demonstrate that improving the tuning of DPM parameters for certain interactive user inputs can provide energy savings of up to 21% or QoE improvements of up to 36%, when compared with the traditional approach. To capture these improvements, we propose a dynamic modeling of user input and data resource access times (e.g. mobile network bandwidth and latency) for interactive workloads, which is based on workload profiling and which we refer to herein as inelasticity analysis. The proposed approach is implemented through online tuning of a DPM runtime in the Android operating system and is validated through a Monte Carlo simulation of interactive workloads. In comparison to the default DPM tuning, the proposed approach achieves energy savings of 13% or QoE improvement of 27% or a selectable trade-off, e.g. 9% energy savings and 15% QoE improvement.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116953845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1