首页 > 最新文献

IEEE/ACM International Symposium on Low Power Electronics and Design最新文献

英文 中文
Near-/sub-threshold DLL-based clock generator with PVT-aware locking range compensation 具有pvt感知锁定范围补偿的近/亚阈值dll时钟发生器
Pub Date : 2011-08-01 DOI: 10.1109/ISLPED.2011.5993597
Ming-Hung Chang, Chung-Ying Hsieh, Mei-Wei Chen, W. Hwang
A near-/sub-threshold programmable clock generator is proposed in this paper. The major challenge of the ultra-low voltage (ULV) circuits is that the lock-in range of the delay line is easily affected by the environmental variations. In the proposed clock generator, there is a PVT compensation unit which consists of a set of delay line and a PVT detector. The unit is responsible for adjusting the lock-in range of clock generator to guarantee successful clock lock. In addition, the variation-aware logic design is performed in the clock generator, which improves the reliability on process variation. Also, the adoption of pulse-circulating scheme suppresses process induced output clock jitter. Furthermore, it has the ability to generate the output clock with frequency from 1/8 to 4 times of the reference clock. The clock generator has been designed using UMC 65nm CMOS technology. The frequencies of reference clock are 625 kHz at 0.2V and 5MHz at 0.5V. The power consumptions are 0.18μW and 5.17μW, respectively, at 0.2V and 0.5V. The core area of this clock generator is 0.01mm2.
提出了一种近/亚阈值可编程时钟发生器。超低电压(ULV)电路面临的主要挑战是延迟线的锁定范围容易受到环境变化的影响。在该时钟发生器中,有一个由一组延迟线和一个PVT检测器组成的PVT补偿单元。该单元负责调整时钟发生器的锁定范围,以保证时钟锁定成功。此外,在时钟发生器中进行了变化感知逻辑设计,提高了对过程变化的可靠性。同时,采用脉冲循环方案抑制了过程引起的输出时钟抖动。此外,它还具有产生频率为参考时钟1/8到4倍的输出时钟的能力。时钟发生器采用联华电子65nm CMOS技术设计。参考时钟的频率为0.2V时的625 kHz和0.5V时的5MHz。在0.2V和0.5V电压下,功耗分别为0.18μW和5.17μW。时钟发生器的核心面积为0.01mm2。
{"title":"Near-/sub-threshold DLL-based clock generator with PVT-aware locking range compensation","authors":"Ming-Hung Chang, Chung-Ying Hsieh, Mei-Wei Chen, W. Hwang","doi":"10.1109/ISLPED.2011.5993597","DOIUrl":"https://doi.org/10.1109/ISLPED.2011.5993597","url":null,"abstract":"A near-/sub-threshold programmable clock generator is proposed in this paper. The major challenge of the ultra-low voltage (ULV) circuits is that the lock-in range of the delay line is easily affected by the environmental variations. In the proposed clock generator, there is a PVT compensation unit which consists of a set of delay line and a PVT detector. The unit is responsible for adjusting the lock-in range of clock generator to guarantee successful clock lock. In addition, the variation-aware logic design is performed in the clock generator, which improves the reliability on process variation. Also, the adoption of pulse-circulating scheme suppresses process induced output clock jitter. Furthermore, it has the ability to generate the output clock with frequency from 1/8 to 4 times of the reference clock. The clock generator has been designed using UMC 65nm CMOS technology. The frequencies of reference clock are 625 kHz at 0.2V and 5MHz at 0.5V. The power consumptions are 0.18μW and 5.17μW, respectively, at 0.2V and 0.5V. The core area of this clock generator is 0.01mm2.","PeriodicalId":117694,"journal":{"name":"IEEE/ACM International Symposium on Low Power Electronics and Design","volume":"21 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114023877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
TLB index-based tagging for cache energy reduction 基于TLB索引的缓存能量降低标记
Pub Date : 2011-08-01 DOI: 10.5555/2016802.2016828
Jongmin Lee, Seokin Hong, Soontae Kim
Conventional cache tag matching is based on addresses to identify correct data in caches. However, this tagging scheme is not efficient because tag bits are unnecessarily large. From our observations, there are not many unique tag bits due to typically small working sets, which are conventionally captured by TLBs. To effectively exploit this fact, we propose TLB index-based cache tagging scheme. This new tagging scheme reduces required number of tag bits to one-fourth of the conventional tagging scheme. The reduced tag bits decrease tag bits array area by 72% and its energy consumption by 58%. From our experiments, our proposed new tagging scheme reduces instruction cache energy consumption by 13% for embedded systems.
传统的缓存标签匹配是基于地址来识别缓存中正确的数据。然而,这种标记方案效率不高,因为标记位不必要地大。从我们的观察来看,由于通常很小的工作集(通常由tlb捕获),没有很多唯一的标签位。为了有效地利用这一事实,我们提出了基于TLB索引的缓存标记方案。这种新的标签方案将所需的标签位数减少到传统标签方案的四分之一。减少的标签位使标签位阵列面积减少72%,能耗减少58%。从我们的实验中,我们提出的新标签方案减少了13%的嵌入式系统指令缓存能耗。
{"title":"TLB index-based tagging for cache energy reduction","authors":"Jongmin Lee, Seokin Hong, Soontae Kim","doi":"10.5555/2016802.2016828","DOIUrl":"https://doi.org/10.5555/2016802.2016828","url":null,"abstract":"Conventional cache tag matching is based on addresses to identify correct data in caches. However, this tagging scheme is not efficient because tag bits are unnecessarily large. From our observations, there are not many unique tag bits due to typically small working sets, which are conventionally captured by TLBs. To effectively exploit this fact, we propose TLB index-based cache tagging scheme. This new tagging scheme reduces required number of tag bits to one-fourth of the conventional tagging scheme. The reduced tag bits decrease tag bits array area by 72% and its energy consumption by 58%. From our experiments, our proposed new tagging scheme reduces instruction cache energy consumption by 13% for embedded systems.","PeriodicalId":117694,"journal":{"name":"IEEE/ACM International Symposium on Low Power Electronics and Design","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122460953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Enhancing phase change memory lifetime through fine-grained current regulation and voltage upscaling 通过细粒度电流调节和电压升级提高相变存储器寿命
Pub Date : 2011-08-01 DOI: 10.1109/ISLPED.2011.5993624
Lei Jiang, Youtao Zhang, Jun Yang
Phase Change Memory (PCM) recently has emerged as a promising memory technology. However it suffers from limited write endurance. Recent studies have shown that the lifetime of PCM cells heavily depends on the RESET energy. Typically, larger than optimal RESET current is employed to accommodate process variation. This leads to over-programming of cells, and dramatically-shortened lifetime. This paper proposes two innovative low power techniques, Fine-Grained Current Regulation (FGCR) and Voltage Upscaling (VU), to cut down the RESET current, leaving a small number of difficult-to-reset cells unused. We then utilize error correction code to rescue those cells. Our experimental results show that FGCR and VU reduce the PCM write power by 33%, and prolong the lifetime of a PCM chip by 71%–102%.
相变存储器(PCM)是近年来发展起来的一种很有前途的存储技术。然而,它的写入持久性有限。最近的研究表明,PCM细胞的寿命在很大程度上取决于RESET能量。通常,大于最佳复位电流被用来适应工艺变化。这导致细胞的过度编程,并大大缩短了寿命。本文提出了两种创新的低功耗技术,细粒度电流调节(FGCR)和电压升级(VU),以减少复位电流,留下少量难以复位的电池闲置。然后我们利用纠错码来挽救这些细胞。实验结果表明,FGCR和VU可使PCM写入功率降低33%,使PCM芯片寿命延长71% ~ 102%。
{"title":"Enhancing phase change memory lifetime through fine-grained current regulation and voltage upscaling","authors":"Lei Jiang, Youtao Zhang, Jun Yang","doi":"10.1109/ISLPED.2011.5993624","DOIUrl":"https://doi.org/10.1109/ISLPED.2011.5993624","url":null,"abstract":"Phase Change Memory (PCM) recently has emerged as a promising memory technology. However it suffers from limited write endurance. Recent studies have shown that the lifetime of PCM cells heavily depends on the RESET energy. Typically, larger than optimal RESET current is employed to accommodate process variation. This leads to over-programming of cells, and dramatically-shortened lifetime. This paper proposes two innovative low power techniques, Fine-Grained Current Regulation (FGCR) and Voltage Upscaling (VU), to cut down the RESET current, leaving a small number of difficult-to-reset cells unused. We then utilize error correction code to rescue those cells. Our experimental results show that FGCR and VU reduce the PCM write power by 33%, and prolong the lifetime of a PCM chip by 71%–102%.","PeriodicalId":117694,"journal":{"name":"IEEE/ACM International Symposium on Low Power Electronics and Design","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117147641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
8T Single-ended sub-threshold SRAM with cross-point data-aware write operation 8T单端亚阈值SRAM,具有交叉点数据感知写操作
Pub Date : 2011-08-01 DOI: 10.1109/ISLPED.2011.5993631
Yi-Wei Chiu, Jihi-Yu Lin, Ming-Hsien Tu, S. Jou, C. Chuang
This paper presents a new 8T SRAM cell with data-aware cross-point Write operation and series connected Read buffer for low power and low voltage operation. The cell features a shared footer device to control the VGND for cell pass-gate (Write) transistors and the Read buffer. The row-based VGND control and the column-based data-aware Write Word-Line form a cross-point Write structure, thus eliminating Write Half-Select Disturb to facilitate bit-interleaving architecture. Replica based timing tracking circuit is used to control the pulse width of Word-Line Enable (WLE) signal to overcome the large timing variation at low voltage and to reduce the Word-Line active power consumption. A 4Kbit SRAM test chip implemented in 90nm HVT CMOS technology operates at 120MHz at 0.6V and 6MHz at 0.38V with measured power consumption of 2.99uW at 6MHz, 0.38V.
本文提出了一种新的8T SRAM单元,具有数据感知的交叉点写入操作和串联读缓冲器,用于低功耗和低电压工作。该单元具有一个共享页脚器件,用于控制单元通栅(写)晶体管和读缓冲器的VGND。基于行的VGND控制和基于列的数据感知的Write Word-Line形成了一个交叉点的Write结构,从而消除了Write Half-Select的干扰,方便了位交错结构。采用基于副本的定时跟踪电路控制字线使能信号的脉宽,以克服低电压下的大定时变化,降低字线有功功耗。采用90nm HVT CMOS技术实现的4Kbit SRAM测试芯片在0.6V和0.38V下分别工作在120MHz和6MHz,在0.38V下测量功耗为2.99uW。
{"title":"8T Single-ended sub-threshold SRAM with cross-point data-aware write operation","authors":"Yi-Wei Chiu, Jihi-Yu Lin, Ming-Hsien Tu, S. Jou, C. Chuang","doi":"10.1109/ISLPED.2011.5993631","DOIUrl":"https://doi.org/10.1109/ISLPED.2011.5993631","url":null,"abstract":"This paper presents a new 8T SRAM cell with data-aware cross-point Write operation and series connected Read buffer for low power and low voltage operation. The cell features a shared footer device to control the VGND for cell pass-gate (Write) transistors and the Read buffer. The row-based VGND control and the column-based data-aware Write Word-Line form a cross-point Write structure, thus eliminating Write Half-Select Disturb to facilitate bit-interleaving architecture. Replica based timing tracking circuit is used to control the pulse width of Word-Line Enable (WLE) signal to overcome the large timing variation at low voltage and to reduce the Word-Line active power consumption. A 4Kbit SRAM test chip implemented in 90nm HVT CMOS technology operates at 120MHz at 0.6V and 6MHz at 0.38V with measured power consumption of 2.99uW at 6MHz, 0.38V.","PeriodicalId":117694,"journal":{"name":"IEEE/ACM International Symposium on Low Power Electronics and Design","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126948831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Eliminating energy of same-content-cell-columns of on-chip SRAM arrays 消除片上SRAM阵列中同内容单元列的能量
Pub Date : 2011-08-01 DOI: 10.1109/ISLPED.2011.5993633
Bushra Ahsan, Lorena Ndreu, I. Sideris, Yiannakis Sazeides, Sachin Idgunji, E. Özer
This work proposes to reduce energy by avoiding access to columns of on-chip SRAM arrays whose cell contents are all 1s or all 0s. We refer to this dynamic phenomenon as the Same-Cell-Content-Column (SCC-column). Analysis reveals that SCC-columns occur frequently in several processor arrays, such as tag arrays of L1 caches, TLBs and predictors. An interval based scheme that employs one bit per column is proposed to track whether we have a SCC-column. We explain how a SCC-column can be leveraged to reduce the energy needed for SRAM read and write accesses. Experimental analysis for a specific processor configuration reveals that the proposed scheme detects SCC-columns effectively. The potential energy savings of the proposed approach at 32nm often exceeds 40% for several processor arrays.
这项工作建议通过避免访问单元内容全部为15或全部为0的片上SRAM阵列的列来减少能量。我们将这种动态现象称为相同单元-内容-列(SCC-column)。分析表明,scc列经常出现在多个处理器阵列中,例如L1缓存的标签阵列、tlb和预测器。提出了一种基于间隔的方案,每列使用一个比特来跟踪我们是否有一个scc列。我们解释了如何利用scc列来减少SRAM读写访问所需的能量。对特定处理器配置的实验分析表明,该方案可以有效地检测scc列。对于多个处理器阵列,所提出的32nm方法的潜在节能通常超过40%。
{"title":"Eliminating energy of same-content-cell-columns of on-chip SRAM arrays","authors":"Bushra Ahsan, Lorena Ndreu, I. Sideris, Yiannakis Sazeides, Sachin Idgunji, E. Özer","doi":"10.1109/ISLPED.2011.5993633","DOIUrl":"https://doi.org/10.1109/ISLPED.2011.5993633","url":null,"abstract":"This work proposes to reduce energy by avoiding access to columns of on-chip SRAM arrays whose cell contents are all 1s or all 0s. We refer to this dynamic phenomenon as the Same-Cell-Content-Column (SCC-column). Analysis reveals that SCC-columns occur frequently in several processor arrays, such as tag arrays of L1 caches, TLBs and predictors. An interval based scheme that employs one bit per column is proposed to track whether we have a SCC-column. We explain how a SCC-column can be leveraged to reduce the energy needed for SRAM read and write accesses. Experimental analysis for a specific processor configuration reveals that the proposed scheme detects SCC-columns effectively. The potential energy savings of the proposed approach at 32nm often exceeds 40% for several processor arrays.","PeriodicalId":117694,"journal":{"name":"IEEE/ACM International Symposium on Low Power Electronics and Design","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127085403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A fast, accurate and simple critical path monitor for improving energy-delay product in DVS systems 一种快速、准确、简单的关键路径监测方法,用于改善分布式交换机系统的能量延迟积
Pub Date : 2011-08-01 DOI: 10.1109/ISLPED.2011.5993672
Junyoung Park, J. Abraham
This paper introduces a design scheme that improves Energy-Delay Product (EDP) in conventional Dynamic Voltage Scaling (DVS) systems by exploiting timing margins. To achieve this scheme, we designed a high-speed Critical Path Monitor composed of several Critical Path Replicas, a Timing Checker, and a Toggle Flip-Flop. The replicas are implemented based on our proposed algorithm, which considers the following two facts: (a) the voltage scaling behavior of logic and interconnect are fundamentally different; (b) various logic gates show different sensitivity in regard to process, temperature, as well as voltage changes. Because the replicas are connected in parallel by C-elements, the longest delay selection among all of the replica delays is performed automatically, improving the system response time. If the utilizable margin is detected by the Timing Checker, the frequency controller increases system clock frequency in order to improve performance at a given voltage level. Using a 45nm CMOS technology, we implemented a 32-bit MIPS processor and multiple Critical Path Monitors. The simulation results reveal that our scheme can improve EDP of the conventional DVS by up to 62%.
本文介绍了一种利用时间余量提高传统动态电压标度系统能量延迟积的设计方案。为了实现这一方案,我们设计了一个高速关键路径监视器,该监视器由多个关键路径副本、定时检查器和切换触发器组成。基于我们提出的算法实现了副本,该算法考虑了以下两个事实:(a)逻辑和互连的电压缩放行为根本不同;(b)各种逻辑门对工艺、温度和电压变化的灵敏度不同。因为副本是通过c元素并行连接的,所以会自动执行所有副本延迟中最长的延迟选择,从而提高系统响应时间。如果时序检查器检测到可用余量,频率控制器增加系统时钟频率,以改善给定电压水平下的性能。采用45纳米CMOS技术,我们实现了一个32位MIPS处理器和多个关键路径监视器。仿真结果表明,该方案可将传统分布式交换机的EDP提高62%。
{"title":"A fast, accurate and simple critical path monitor for improving energy-delay product in DVS systems","authors":"Junyoung Park, J. Abraham","doi":"10.1109/ISLPED.2011.5993672","DOIUrl":"https://doi.org/10.1109/ISLPED.2011.5993672","url":null,"abstract":"This paper introduces a design scheme that improves Energy-Delay Product (EDP) in conventional Dynamic Voltage Scaling (DVS) systems by exploiting timing margins. To achieve this scheme, we designed a high-speed Critical Path Monitor composed of several Critical Path Replicas, a Timing Checker, and a Toggle Flip-Flop. The replicas are implemented based on our proposed algorithm, which considers the following two facts: (a) the voltage scaling behavior of logic and interconnect are fundamentally different; (b) various logic gates show different sensitivity in regard to process, temperature, as well as voltage changes. Because the replicas are connected in parallel by C-elements, the longest delay selection among all of the replica delays is performed automatically, improving the system response time. If the utilizable margin is detected by the Timing Checker, the frequency controller increases system clock frequency in order to improve performance at a given voltage level. Using a 45nm CMOS technology, we implemented a 32-bit MIPS processor and multiple Critical Path Monitors. The simulation results reveal that our scheme can improve EDP of the conventional DVS by up to 62%.","PeriodicalId":117694,"journal":{"name":"IEEE/ACM International Symposium on Low Power Electronics and Design","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132348773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
IMPACT: IMPrecise adders for low-power approximate computing 影响:低功耗近似计算的不精确加法器
Pub Date : 2011-08-01 DOI: 10.1109/ISLPED.2011.5993675
Vaibhav Gupta, Debabrata Mohapatra, S. P. Park, A. Raghunathan, K. Roy
Low-power is an imperative requirement for portable multimedia devices employing various signal processing algorithms and architectures. In most multimedia applications, the final output is interpreted by human senses, which are not perfect. This fact obviates the need to produce exactly correct numerical outputs. Previous research in this context exploits error-resiliency primarily through voltage over-scaling, utilizing algorithmic and architectural techniques to mitigate the resulting errors. In this paper, we propose logic complexity reduction as an alternative approach to take advantage of the relaxation of numerical accuracy. We demonstrate this concept by proposing various imprecise or approximate Full Adder (FA) cells with reduced complexity at the transistor level, and utilize them to design approximate multi-bit adders. In addition to the inherent reduction in switched capacitance, our techniques result in significantly shorter critical paths, enabling voltage scaling. We design architectures for video and image compression algorithms using the proposed approximate arithmetic units, and evaluate them to demonstrate the efficacy of our approach. Post-layout simulations indicate power savings of up to 60% and area savings of up to 37% with an insignificant loss in output quality, when compared to existing implementations.
低功耗是采用各种信号处理算法和体系结构的便携式多媒体设备的必然要求。在大多数多媒体应用中,最终输出是由人的感官来解释的,这并不完美。这一事实消除了产生完全正确的数值输出的需要。在此背景下,先前的研究主要通过电压过标度来利用错误弹性,利用算法和架构技术来减轻由此产生的错误。在本文中,我们提出逻辑复杂性降低作为一种替代方法来利用数值精度的放松。我们通过提出各种不精确或近似的全加法器(FA)单元来证明这一概念,这些单元在晶体管水平上降低了复杂性,并利用它们来设计近似的多位加法器。除了固有的开关电容降低外,我们的技术还显著缩短了关键路径,实现了电压缩放。我们使用所提出的近似算术单元为视频和图像压缩算法设计架构,并对其进行评估以证明我们方法的有效性。布局后的模拟表明,与现有的实现相比,功耗节省高达60%,面积节省高达37%,输出质量损失微不足道。
{"title":"IMPACT: IMPrecise adders for low-power approximate computing","authors":"Vaibhav Gupta, Debabrata Mohapatra, S. P. Park, A. Raghunathan, K. Roy","doi":"10.1109/ISLPED.2011.5993675","DOIUrl":"https://doi.org/10.1109/ISLPED.2011.5993675","url":null,"abstract":"Low-power is an imperative requirement for portable multimedia devices employing various signal processing algorithms and architectures. In most multimedia applications, the final output is interpreted by human senses, which are not perfect. This fact obviates the need to produce exactly correct numerical outputs. Previous research in this context exploits error-resiliency primarily through voltage over-scaling, utilizing algorithmic and architectural techniques to mitigate the resulting errors. In this paper, we propose logic complexity reduction as an alternative approach to take advantage of the relaxation of numerical accuracy. We demonstrate this concept by proposing various imprecise or approximate Full Adder (FA) cells with reduced complexity at the transistor level, and utilize them to design approximate multi-bit adders. In addition to the inherent reduction in switched capacitance, our techniques result in significantly shorter critical paths, enabling voltage scaling. We design architectures for video and image compression algorithms using the proposed approximate arithmetic units, and evaluate them to demonstrate the efficacy of our approach. Post-layout simulations indicate power savings of up to 60% and area savings of up to 37% with an insignificant loss in output quality, when compared to existing implementations.","PeriodicalId":117694,"journal":{"name":"IEEE/ACM International Symposium on Low Power Electronics and Design","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133448036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 450
Variation-aware static and dynamic writability analysis for voltage-scaled bit-interleaved 8-T SRAMs 电压标度位交错8-T ram的动态与静态可写性分析
Pub Date : 2011-08-01 DOI: 10.1109/ISLPED.2011.5993627
Daeyeon Kim, V. Chandra, R. Aitken, D. Blaauw, D. Sylvester
As process technology scales, SRAM robustness is compromised. In addition, lowering the supply voltage to reduce power consumption further reduces the read and write margins. To maintain robustness, a new bitcell topology, 8-T bitcell, has been proposed and read where write operation can be separately optimized. However, it can aggravate the half select disturb when write word-line boosting is applied or the bitcell sizing is done to enable robust writability. The half select disturb issue limits the use of a bit-interleaved array configuration required for immunity to soft errors. The opposing characteristic between write operation and half select disturb generates a new constraint which should be carefully considered for robust operation of voltage-scaled bit-interleaved 8-T SRAMs. In this paper, we propose bit-interleaved writability analysis that captures the double-sided constraints placed on the word-line pulse width and voltage level to ensure writability while avoiding half select disturb issue. Using the proposed analysis, we investigate the effectiveness of word-line boosting and device sizing optimization on improving bitcell robustness in low voltage region. With 57.7% of area overhead and 0.1V of word-line boosting, we can achieve 4.6σ of VTH mismatch tolerance at 0.6V and it shows 41% of energy saving.
随着工艺技术的扩展,SRAM的健壮性受到损害。此外,降低电源电压以降低功耗进一步降低读写余量。为了保持鲁棒性,提出了一种新的位单元拓扑,即8-T位单元,并且可以分别优化读和写操作。然而,当应用写字行增强或为实现健壮的可写性而调整位元大小时,可能会加剧半选择干扰。半选择干扰问题限制了对软错误免疫所需的位交错阵列配置的使用。写操作和半选择干扰之间的对立特性产生了一个新的约束,对于电压标度位交错8-T ram的鲁棒工作必须认真考虑。在本文中,我们提出了位交错可写性分析,该分析捕获了放置在字行脉冲宽度和电压电平上的双面约束,以确保可写性,同时避免半选择干扰问题。利用所提出的分析,我们研究了字线提升和器件尺寸优化在提高低电压区域位元鲁棒性方面的有效性。以57.7%的面积开销和0.1V字线升压为代价,在0.6V时可实现4.6σ的VTH错配容限,节能41%。
{"title":"Variation-aware static and dynamic writability analysis for voltage-scaled bit-interleaved 8-T SRAMs","authors":"Daeyeon Kim, V. Chandra, R. Aitken, D. Blaauw, D. Sylvester","doi":"10.1109/ISLPED.2011.5993627","DOIUrl":"https://doi.org/10.1109/ISLPED.2011.5993627","url":null,"abstract":"As process technology scales, SRAM robustness is compromised. In addition, lowering the supply voltage to reduce power consumption further reduces the read and write margins. To maintain robustness, a new bitcell topology, 8-T bitcell, has been proposed and read where write operation can be separately optimized. However, it can aggravate the half select disturb when write word-line boosting is applied or the bitcell sizing is done to enable robust writability. The half select disturb issue limits the use of a bit-interleaved array configuration required for immunity to soft errors. The opposing characteristic between write operation and half select disturb generates a new constraint which should be carefully considered for robust operation of voltage-scaled bit-interleaved 8-T SRAMs. In this paper, we propose bit-interleaved writability analysis that captures the double-sided constraints placed on the word-line pulse width and voltage level to ensure writability while avoiding half select disturb issue. Using the proposed analysis, we investigate the effectiveness of word-line boosting and device sizing optimization on improving bitcell robustness in low voltage region. With 57.7% of area overhead and 0.1V of word-line boosting, we can achieve 4.6σ of VTH mismatch tolerance at 0.6V and it shows 41% of energy saving.","PeriodicalId":117694,"journal":{"name":"IEEE/ACM International Symposium on Low Power Electronics and Design","volume":"5 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131922980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Pulsed-latch-based clock tree migration for dynamic power reduction 基于脉冲锁存器的时钟树动态降功耗迁移
Pub Date : 2011-08-01 DOI: 10.1109/ISLPED.2011.5993601
Hong-Ting Lin, Yi-Lin Chuang, Tsung-Yi Ho
Minimizing the clock tree has been known as an effective approach to reduce power dissipation in modern circuit designs. However, most existing power-aware clock tree synthesis algorithms still focus on optimizing power in flip-flops, which might have limited power savings. In this work, we explore the pulsed-latch utilization in clock tree synthesis for further power savings. We are the first work in the literature to propose a novel synthesis algorithm to efficiently migrate a flip-flop-based clock tree into a pulsed-latch one. To maintain performance of a clock tree while considering load balance (skew issues) simultaneously, we determine the clock tree topology by the minimum-cost maximum-flow network. Experimental results show that our algorithm can further reduce power consumption by 22% on average compared to approaches without pulsed latches. Categories and Subject Descriptors: B.7.2 [Integrated Circuits]: Design Aids General Terms: Algorithms, Design
在现代电路设计中,最小化时钟树被认为是降低功耗的有效方法。然而,大多数现有的功耗感知时钟树合成算法仍然专注于优化触发器的功耗,这可能会限制功耗节省。在这项工作中,我们探索脉冲锁存器在时钟树合成中的应用,以进一步节省功耗。我们在文献中首次提出了一种新的合成算法,以有效地将基于触发器的时钟树迁移到脉冲锁存器时钟树中。为了在考虑负载平衡(倾斜问题)的同时保持时钟树的性能,我们通过最小成本最大流量网络确定时钟树拓扑。实验结果表明,与不使用脉冲锁存器的方法相比,该算法可进一步平均降低22%的功耗。类别和主题描述:B.7.2[集成电路]:设计辅助工具
{"title":"Pulsed-latch-based clock tree migration for dynamic power reduction","authors":"Hong-Ting Lin, Yi-Lin Chuang, Tsung-Yi Ho","doi":"10.1109/ISLPED.2011.5993601","DOIUrl":"https://doi.org/10.1109/ISLPED.2011.5993601","url":null,"abstract":"Minimizing the clock tree has been known as an effective approach to reduce power dissipation in modern circuit designs. However, most existing power-aware clock tree synthesis algorithms still focus on optimizing power in flip-flops, which might have limited power savings. In this work, we explore the pulsed-latch utilization in clock tree synthesis for further power savings. We are the first work in the literature to propose a novel synthesis algorithm to efficiently migrate a flip-flop-based clock tree into a pulsed-latch one. To maintain performance of a clock tree while considering load balance (skew issues) simultaneously, we determine the clock tree topology by the minimum-cost maximum-flow network. Experimental results show that our algorithm can further reduce power consumption by 22% on average compared to approaches without pulsed latches. Categories and Subject Descriptors: B.7.2 [Integrated Circuits]: Design Aids General Terms: Algorithms, Design","PeriodicalId":117694,"journal":{"name":"IEEE/ACM International Symposium on Low Power Electronics and Design","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133863133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Does low-power design imply energy efficiency for data centers? 低功耗设计意味着数据中心的能源效率吗?
Pub Date : 2011-08-01 DOI: 10.1109/ISLPED.2011.5993621
David Meisner, T. Wenisch
Data center efficiency has quickly become a first-class design goal. In response, many studies have emerged from the academic community and industry using low-power design to help improve the energy efficiency of server hardware. Generally, these proposals hold the assumption that low-power design is inherently better for energy efficiency; this preconception stems mostly from great success in the mobile space with building low-power, energy-efficient systems. We observe that unlike mobile devices, constraining a data center server to a low power budget is arbitrary and higher power design choices can be more energy efficient. We analyze the energy efficiency design space of past commercial server designs and find that high-power servers are generally more energy efficient than low-power ones. Furthermore, we evaluate building low- or high-power server clusters and find that the small increase in the cost of cooling high-powered servers is easily outweighed by their greater efficiency.
数据中心的效率已迅速成为一流的设计目标。作为回应,学术界和工业界已经出现了许多使用低功耗设计来帮助提高服务器硬件能源效率的研究。一般来说,这些建议都假设低功耗设计本质上更有利于能源效率;这种先入为主的观念主要源于在移动领域建立低功耗、节能系统的巨大成功。我们观察到,与移动设备不同,将数据中心服务器限制在低功耗预算是任意的,而更高功耗的设计选择可以更节能。我们分析了过去商用服务器设计的能效设计空间,发现高功率服务器通常比低功率服务器更节能。此外,我们评估了构建低功率或高功率服务器集群,并发现冷却高功率服务器的成本的小幅增加很容易被更高的效率所抵消。
{"title":"Does low-power design imply energy efficiency for data centers?","authors":"David Meisner, T. Wenisch","doi":"10.1109/ISLPED.2011.5993621","DOIUrl":"https://doi.org/10.1109/ISLPED.2011.5993621","url":null,"abstract":"Data center efficiency has quickly become a first-class design goal. In response, many studies have emerged from the academic community and industry using low-power design to help improve the energy efficiency of server hardware. Generally, these proposals hold the assumption that low-power design is inherently better for energy efficiency; this preconception stems mostly from great success in the mobile space with building low-power, energy-efficient systems. We observe that unlike mobile devices, constraining a data center server to a low power budget is arbitrary and higher power design choices can be more energy efficient. We analyze the energy efficiency design space of past commercial server designs and find that high-power servers are generally more energy efficient than low-power ones. Furthermore, we evaluate building low- or high-power server clusters and find that the small increase in the cost of cooling high-powered servers is easily outweighed by their greater efficiency.","PeriodicalId":117694,"journal":{"name":"IEEE/ACM International Symposium on Low Power Electronics and Design","volume":"221 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133809320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
期刊
IEEE/ACM International Symposium on Low Power Electronics and Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1