首页 > 最新文献

Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)最新文献

英文 中文
An energy efficient GPGPU memory hierarchy with tiny incoherent caches 具有微小非相干缓存的高能效GPGPU内存层次结构
Alamelu Sankaranarayanan, E. K. Ardestani, J. L. Briz, Jose Renau
With progressive generations and the ever-increasing promise of computing power, GPGPUs have been quickly growing in size, and at the same time, energy consumption has become a major bottleneck for them. The first level data cache and the scratchpad memory are critical to the performance of a GPGPU, but they are extremely energy inefficient due to the large number of cores they need to serve. This problem could be mitigated by introducing a cache higher up in hierarchy that services fewer cores, but this introduces cache coherency issues that may become very significant, especially for a GPGPU with hundreds of thousands of in-flight threads. In this paper, we propose adding incoherent tinyCaches between each lane in an SM, and the first level data cache that is currently shared by all the lanes in an SM. In a normal multiprocessor, this would require hardware cache coherence between all the SM lanes capable of handling hundreds of thousands of threads. Our incoherent tinyCache architecture exploits certain unique features of the CUDA/OpenCL programming model to avoid complex coherence schemes. This tinyCache is able to filter out 62% of memory requests that would otherwise need to be serviced by the DL1G, and almost 81% of scratchpad memory requests, allowing us to achieve a 37% energy reduction in the on-chip memory hierarchy. We evaluate the tinyCache for different memory patterns and show that it is beneficial in most cases.
随着一代又一代的进步和计算能力的不断提高,gpgpu的尺寸也在迅速增长,与此同时,能耗也成为其主要的瓶颈。第一级数据缓存和刮刮板存储器对GPGPU的性能至关重要,但由于需要服务大量的内核,它们的能源效率非常低。这个问题可以通过在层次结构中引入一个更高的缓存来缓解,这个缓存可以为更少的内核提供服务,但是这引入了缓存一致性问题,这可能会变得非常重要,特别是对于具有数十万个动态线程的GPGPU。在本文中,我们建议在SM的每个通道之间添加非相干的tinycache,以及在SM中当前由所有通道共享的第一级数据缓存。在普通的多处理器中,这需要能够处理数十万个线程的所有SM通道之间的硬件缓存一致性。我们的非相干tinyCache架构利用CUDA/OpenCL编程模型的某些独特功能来避免复杂的相干方案。这个tinyCache能够过滤掉62%的内存请求,否则将需要由DL1G提供服务,以及几乎81%的刮刮板内存请求,使我们能够在片上内存层次结构中实现37%的能量减少。我们对不同的内存模式评估了tinyCache,并表明它在大多数情况下是有益的。
{"title":"An energy efficient GPGPU memory hierarchy with tiny incoherent caches","authors":"Alamelu Sankaranarayanan, E. K. Ardestani, J. L. Briz, Jose Renau","doi":"10.1109/ISLPED.2013.6629259","DOIUrl":"https://doi.org/10.1109/ISLPED.2013.6629259","url":null,"abstract":"With progressive generations and the ever-increasing promise of computing power, GPGPUs have been quickly growing in size, and at the same time, energy consumption has become a major bottleneck for them. The first level data cache and the scratchpad memory are critical to the performance of a GPGPU, but they are extremely energy inefficient due to the large number of cores they need to serve. This problem could be mitigated by introducing a cache higher up in hierarchy that services fewer cores, but this introduces cache coherency issues that may become very significant, especially for a GPGPU with hundreds of thousands of in-flight threads. In this paper, we propose adding incoherent tinyCaches between each lane in an SM, and the first level data cache that is currently shared by all the lanes in an SM. In a normal multiprocessor, this would require hardware cache coherence between all the SM lanes capable of handling hundreds of thousands of threads. Our incoherent tinyCache architecture exploits certain unique features of the CUDA/OpenCL programming model to avoid complex coherence schemes. This tinyCache is able to filter out 62% of memory requests that would otherwise need to be serviced by the DL1G, and almost 81% of scratchpad memory requests, allowing us to achieve a 37% energy reduction in the on-chip memory hierarchy. We evaluate the tinyCache for different memory patterns and show that it is beneficial in most cases.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79553275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Power mapping and modeling of multi-core processors 多核处理器的功率映射和建模
K. Dev, Abdullah Nazma Nowroz, S. Reda
We propose new techniques for post-silicon power mapping and modeling of multi-core processors using infrared imaging and performance counter measurements. An accurate finite-element modeling framework is used to capture the relationship between temperature and power, while compensating for the artifacts introduced from substituting traditional heat removal mechanisms with oil-based infrared-transparent cooling mechanisms. We use thermal conditioning techniques to build leakage power models for the die. Utilizing the power maps identified from infrared mapping, we develop empirical power models for different processor blocks based on the measurements from the performance monitoring counters (PMCs), and utilize the PMC-based models to analyze the transient power consumption. In our experiments, we capture thermal images from a quad-core processor under different workload conditions, and then we reconstruct the dynamic and leakage power maps for different blocks. Our results show good accuracy in mapping and modeling, revealing good insights into the trends of power consumption in multi-core processors.
我们提出了使用红外成像和性能计数器测量的后硅功率映射和多核处理器建模的新技术。一个精确的有限元建模框架被用来捕捉温度和功率之间的关系,同时补偿了用油基红外透明冷却机制取代传统散热机制所带来的伪影。我们利用热调节技术建立了模具的泄漏功率模型。利用红外映射得到的功耗图,基于性能监控计数器(pmc)的测量数据,建立了不同处理器模块的经验功耗模型,并利用基于pmc的模型分析了暂态功耗。在我们的实验中,我们从四核处理器捕获不同工作负载条件下的热图像,然后重建不同块的动态和泄漏功率图。我们的结果在映射和建模方面显示出良好的准确性,揭示了对多核处理器功耗趋势的良好见解。
{"title":"Power mapping and modeling of multi-core processors","authors":"K. Dev, Abdullah Nazma Nowroz, S. Reda","doi":"10.5555/2648668.2648680","DOIUrl":"https://doi.org/10.5555/2648668.2648680","url":null,"abstract":"We propose new techniques for post-silicon power mapping and modeling of multi-core processors using infrared imaging and performance counter measurements. An accurate finite-element modeling framework is used to capture the relationship between temperature and power, while compensating for the artifacts introduced from substituting traditional heat removal mechanisms with oil-based infrared-transparent cooling mechanisms. We use thermal conditioning techniques to build leakage power models for the die. Utilizing the power maps identified from infrared mapping, we develop empirical power models for different processor blocks based on the measurements from the performance monitoring counters (PMCs), and utilize the PMC-based models to analyze the transient power consumption. In our experiments, we capture thermal images from a quad-core processor under different workload conditions, and then we reconstruct the dynamic and leakage power maps for different blocks. Our results show good accuracy in mapping and modeling, revealing good insights into the trends of power consumption in multi-core processors.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73933699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
An analytical solution for multi-core energy calculation with consideration of leakage and temperature dependency 考虑泄漏和温度依赖的多核能量计算的解析解
Ming Fan, Vivek Chaturvedi, Shi Sha, Gang Quan
Energy minimization is a critical issue and challenge when considering the cyclic dependency of leakage power and temperature as IC technology reaches deep sub-micron level. In this paper, we present an analytical method to calculate the energy consumption efficiently and effectively for a given voltage schedule on a multi-core platform, with the leakage/temperature dependency taken into consideration. Our experiments show that the proposed method can achieve a speedup of 15 times compared with the numerical method, with a relative error of no more than 1.5%.
当集成电路技术达到深亚微米水平时,考虑泄漏功率和温度的循环依赖关系,能量最小化是一个关键问题和挑战。在本文中,我们提出了一种分析方法,在考虑泄漏/温度依赖性的情况下,有效地计算多核平台上给定电压计划的能量消耗。实验表明,与数值方法相比,该方法可以实现15倍的加速,相对误差不超过1.5%。
{"title":"An analytical solution for multi-core energy calculation with consideration of leakage and temperature dependency","authors":"Ming Fan, Vivek Chaturvedi, Shi Sha, Gang Quan","doi":"10.1109/ISLPED.2013.6629322","DOIUrl":"https://doi.org/10.1109/ISLPED.2013.6629322","url":null,"abstract":"Energy minimization is a critical issue and challenge when considering the cyclic dependency of leakage power and temperature as IC technology reaches deep sub-micron level. In this paper, we present an analytical method to calculate the energy consumption efficiently and effectively for a given voltage schedule on a multi-core platform, with the leakage/temperature dependency taken into consideration. Our experiments show that the proposed method can achieve a speedup of 15 times compared with the numerical method, with a relative error of no more than 1.5%.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75772833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
REEL: Reducing effective execution latency of floating point operations REEL:减少浮点操作的有效执行延迟
Vignyan Reddy Kothinti Naresh, S. Gilani, Erika Gunadi, N. Kim, M. Schulte, Mikko H. Lipasti
The height of the dynamic dependence graph of a program, as executed by a processor, determines the minimum bound on the execution time. This height can be decreased by reducing the effective execution latency of operations that form dependence chains in the graph. In this paper, we propose a technique called REEL to reduce overall latency of chains of dependent floating point (FP) operations by increasing the throughput of computation. REEL comprises of a high-throughput floating point unit (HFP) that allows early issue of an FP Add that is dependent on another FP Add or FP Multiply. This is complemented by instruction scheduler modifications that allow early issue of dependent FP Adds, and a novel checker logic that corrects any precision errors. Unlike conventional static operation fusion, like fused Multiply-Add (FMA), there are no changes to the instruction set to enable utilization of the new hardware, and no recompilation is necessary. Furthermore, unlike ISA-level FMA, our technique produces results that are bit compatible while boosting performance of Add-Add dependence pairs in addition to Multiply-Add pairs. Our evaluation of REEL using CFP2006 benchmarks shows an average performance gain of 7.6% and maximum performance gain of 17% while consuming 1.2% lower energy.
由处理器执行的程序的动态依赖图的高度决定了执行时间的最小界限。这个高度可以通过减少在图中形成依赖链的操作的有效执行延迟来降低。在本文中,我们提出了一种称为REEL的技术,通过增加计算吞吐量来减少依赖浮点(FP)操作链的总体延迟。REEL由一个高吞吐量浮点单元(HFP)组成,它允许早期发布一个依赖于另一个FP Add或FP Multiply的FP Add。这是由指令调度器修改的补充,允许早期发布依赖的FP add,以及一种新的检查器逻辑,可以纠正任何精度错误。与传统的静态操作融合(如融合乘法-加法(FMA))不同,不需要更改指令集来启用新硬件,也不需要重新编译。此外,与isa级FMA不同,我们的技术产生的结果是位兼容的,同时提高了除乘法-加法对之外的加法依赖对的性能。我们使用CFP2006基准测试对REEL进行的评估显示,平均性能提高了7.6%,最大性能提高了17%,同时能耗降低了1.2%。
{"title":"REEL: Reducing effective execution latency of floating point operations","authors":"Vignyan Reddy Kothinti Naresh, S. Gilani, Erika Gunadi, N. Kim, M. Schulte, Mikko H. Lipasti","doi":"10.1109/ISLPED.2013.6629292","DOIUrl":"https://doi.org/10.1109/ISLPED.2013.6629292","url":null,"abstract":"The height of the dynamic dependence graph of a program, as executed by a processor, determines the minimum bound on the execution time. This height can be decreased by reducing the effective execution latency of operations that form dependence chains in the graph. In this paper, we propose a technique called REEL to reduce overall latency of chains of dependent floating point (FP) operations by increasing the throughput of computation. REEL comprises of a high-throughput floating point unit (HFP) that allows early issue of an FP Add that is dependent on another FP Add or FP Multiply. This is complemented by instruction scheduler modifications that allow early issue of dependent FP Adds, and a novel checker logic that corrects any precision errors. Unlike conventional static operation fusion, like fused Multiply-Add (FMA), there are no changes to the instruction set to enable utilization of the new hardware, and no recompilation is necessary. Furthermore, unlike ISA-level FMA, our technique produces results that are bit compatible while boosting performance of Add-Add dependence pairs in addition to Multiply-Add pairs. Our evaluation of REEL using CFP2006 benchmarks shows an average performance gain of 7.6% and maximum performance gain of 17% while consuming 1.2% lower energy.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72718154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Coordinated refresh: Energy efficient techniques for DRAM refresh scheduling 协调刷新:DRAM刷新调度的节能技术
Ishwar Bhati, Zeshan A. Chishti, B. Jacob
As the size and speed of DRAM devices increase, the performance and energy overheads due to refresh become more significant. To reduce refresh penalty we propose techniques referred collectively as “Coordinated Refresh”, in which scheduling of low power modes and refresh commands are coordinated so that most of the required refreshes are issued when the DRAM device is in the deepest low power Self Refresh (SR) mode. Our approach saves DRAM background power because the peripheral circuitry and clocks are turned off in the SR mode. Our proposed solutions improve DRAM energy efficiency by 10% as compared to baseline, averaged across all the SPEC CPU 2006 benchmarks.
随着DRAM设备的尺寸和速度的增加,由于刷新引起的性能和能源开销变得更加显著。为了减少刷新损失,我们提出了统称为“协调刷新”的技术,其中协调低功耗模式和刷新命令的调度,以便在DRAM设备处于最深的低功耗自刷新(SR)模式时发出大多数所需的刷新。我们的方法节省了DRAM后台功耗,因为外围电路和时钟在SR模式下关闭。与基准相比,我们提出的解决方案将DRAM能源效率提高了10%,这是在所有SPEC CPU 2006基准测试中的平均值。
{"title":"Coordinated refresh: Energy efficient techniques for DRAM refresh scheduling","authors":"Ishwar Bhati, Zeshan A. Chishti, B. Jacob","doi":"10.1109/islped.2013.6629295","DOIUrl":"https://doi.org/10.1109/islped.2013.6629295","url":null,"abstract":"As the size and speed of DRAM devices increase, the performance and energy overheads due to refresh become more significant. To reduce refresh penalty we propose techniques referred collectively as “Coordinated Refresh”, in which scheduling of low power modes and refresh commands are coordinated so that most of the required refreshes are issued when the DRAM device is in the deepest low power Self Refresh (SR) mode. Our approach saves DRAM background power because the peripheral circuitry and clocks are turned off in the SR mode. Our proposed solutions improve DRAM energy efficiency by 10% as compared to baseline, averaged across all the SPEC CPU 2006 benchmarks.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84537169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Write intensity prediction for energy-efficient non-volatile caches 高能效非易失性缓存的写入强度预测
Junwhan Ahn, S. Yoo, Kiyoung Choi
This paper presents a novel concept called write intensity prediction for energy-efficient non-volatile caches as well as the architecture that implements the concept. The key idea is to correlate write intensity of cache blocks with addresses of memory access instructions that incur cache misses of those blocks. The predictor keeps track of instructions that tend to load write-intensive blocks and utilizes that information to predict write intensity of blocks. Based on this concept, we propose a block placement strategy driven by write intensity prediction for SRAM/STT-RAM hybrid caches. Experimental results show that the proposed approach reduces write energy consumption by 55% on average compared to the existing hybrid cache architecture.
本文提出了节能非易失性缓存的写强度预测的新概念,以及实现该概念的体系结构。关键思想是将缓存块的写强度与导致这些块缓存丢失的内存访问指令的地址相关联。预测器跟踪那些倾向于加载写密集型块的指令,并利用这些信息来预测块的写强度。基于这一概念,我们提出了一种基于写入强度预测的SRAM/STT-RAM混合缓存块放置策略。实验结果表明,与现有的混合缓存结构相比,该方法平均减少了55%的写能耗。
{"title":"Write intensity prediction for energy-efficient non-volatile caches","authors":"Junwhan Ahn, S. Yoo, Kiyoung Choi","doi":"10.1109/ISLPED.2013.6629298","DOIUrl":"https://doi.org/10.1109/ISLPED.2013.6629298","url":null,"abstract":"This paper presents a novel concept called write intensity prediction for energy-efficient non-volatile caches as well as the architecture that implements the concept. The key idea is to correlate write intensity of cache blocks with addresses of memory access instructions that incur cache misses of those blocks. The predictor keeps track of instructions that tend to load write-intensive blocks and utilizes that information to predict write intensity of blocks. Based on this concept, we propose a block placement strategy driven by write intensity prediction for SRAM/STT-RAM hybrid caches. Experimental results show that the proposed approach reduces write energy consumption by 55% on average compared to the existing hybrid cache architecture.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90362550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Single-cycle, pulse-shaped critical path monitor in the POWER7+ microprocessor POWER7+微处理器中的单周期脉冲形关键路径监视器
A. Drake, M. Floyd, Richard L. Willaman, Derek J. Hathaway, J. Hernández, Crystal Soja, Marshall D. Tiner, G. Carpenter, R. Senger
A 32nm SOI critical path monitor (CPM) that can provide timing measurements to a Digital PLL for dynamic frequency adjustments in the 8-core POWER7+™ microprocessor is described. The CPM calibrates to within 2% of cycle time from nominal to turbo voltages. Its voltage sensitivity is 10mV/bit. It tracks processor temperature sensitivity to within 1.5% of nominal frequency, and has a sample jitter less than 1.5% of nominal frequency. The ability to detect noise dynamically allows the system to operate the processor closer to its optimal frequency for any given voltage, resulting in lower voltage for power savings or higher frequency for performance improvements.
描述了一种32nm SOI关键路径监视器(CPM),它可以为8核POWER7+™微处理器中的数字锁相环提供定时测量,用于动态频率调整。CPM校准到2%的周期时间内,从标称到涡轮电压。其电压灵敏度为10mV/bit。它跟踪处理器温度灵敏度在标称频率的1.5%以内,采样抖动小于标称频率的1.5%。动态检测噪声的能力允许系统在任何给定电压下运行处理器,使其更接近其最佳频率,从而降低电压以节省电力或提高频率以提高性能。
{"title":"Single-cycle, pulse-shaped critical path monitor in the POWER7+ microprocessor","authors":"A. Drake, M. Floyd, Richard L. Willaman, Derek J. Hathaway, J. Hernández, Crystal Soja, Marshall D. Tiner, G. Carpenter, R. Senger","doi":"10.1109/ISLPED.2013.6629293","DOIUrl":"https://doi.org/10.1109/ISLPED.2013.6629293","url":null,"abstract":"A 32nm SOI critical path monitor (CPM) that can provide timing measurements to a Digital PLL for dynamic frequency adjustments in the 8-core POWER7+™ microprocessor is described. The CPM calibrates to within 2% of cycle time from nominal to turbo voltages. Its voltage sensitivity is 10mV/bit. It tracks processor temperature sensitivity to within 1.5% of nominal frequency, and has a sample jitter less than 1.5% of nominal frequency. The ability to detect noise dynamically allows the system to operate the processor closer to its optimal frequency for any given voltage, resulting in lower voltage for power savings or higher frequency for performance improvements.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90511609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Rethinking DC-DC converter design constraints for adaptable systems that target the minimum-energy point 以最小能量点为目标的自适应系统DC-DC变换器设计约束的再思考
M. Turnquist, Jani Mäkipää, M. Hiienkari, Hanh-Phuc Le, L. Koskinen
This paper explores a new DC-DC converter design constraint for adaptable systems that target the minimum-energy point (MEP). Traditionally, DC-DC converters have regulated to a fixed output voltage over a wide range of input voltages. For energy-constrained systems that target the MEP, regulating them to a fixed voltage is unnecessary since changes in the output voltage near the MEP have little impact on the energy per cycle. This paper applies a new and traditional design constraint to a 3:1 series-parallel switched-capacitor (SC) DC-DC converter in 28 nm CMOS. The new design constraint allows for decreased design time, less area, and less system-level energy per cycle compared to traditional constraints.
本文探讨了一种新的以最小能量点为目标的自适应系统DC-DC变换器设计约束。传统上,DC-DC变换器在很宽的输入电压范围内调节到固定的输出电压。对于以MEP为目标的能量受限系统,将其调节到固定电压是不必要的,因为MEP附近输出电压的变化对每个周期的能量影响很小。本文将一种新的和传统的设计约束应用于28纳米CMOS中3:1串并联开关电容(SC) DC-DC变换器。与传统的设计约束相比,新的设计约束允许减少设计时间、面积和每个周期的系统级能量。
{"title":"Rethinking DC-DC converter design constraints for adaptable systems that target the minimum-energy point","authors":"M. Turnquist, Jani Mäkipää, M. Hiienkari, Hanh-Phuc Le, L. Koskinen","doi":"10.1109/ISLPED.2013.6629327","DOIUrl":"https://doi.org/10.1109/ISLPED.2013.6629327","url":null,"abstract":"This paper explores a new DC-DC converter design constraint for adaptable systems that target the minimum-energy point (MEP). Traditionally, DC-DC converters have regulated to a fixed output voltage over a wide range of input voltages. For energy-constrained systems that target the MEP, regulating them to a fixed voltage is unnecessary since changes in the output voltage near the MEP have little impact on the energy per cycle. This paper applies a new and traditional design constraint to a 3:1 series-parallel switched-capacitor (SC) DC-DC converter in 28 nm CMOS. The new design constraint allows for decreased design time, less area, and less system-level energy per cycle compared to traditional constraints.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86781943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Robustness-driven energy-efficient ultra-low voltage standard cell design with intra-cell mixed-Vt methodology 基于胞内混合vt方法的鲁棒驱动节能超低电压标准胞设计
Wenfeng Zhao, Yajun Ha, Chin Hau Hoo, A. Alvarez
High functional yield is one of the key challenges for subthreshold standard cell designs. Device upsizing is a commonly used but suboptimal method due to its overheads in energy and area. In this paper, we propose a robustness-driven intra-cell mixed-Vt design methodology (MVT-ULV) for the robust ultra-low voltage operation. It uses low threshold voltage transistors in the weak pulling network of logic gates to enhance the robustness. It guarantees the high functional yield with the minimum energy/area overheads. We demonstrate on a commercial 65nm CMOS process that, our proposed design methodology shows up to 60mV and 110mV robustness improvement at 300mV power supply voltage over the commercial library cells and the cells built with previous Leakage-Minimization mixed-Vt methods (MVT-LM) under the same cell area constraints, respectively. In addition, the proposed MVT-ULV library enables ITC'99 benchmark circuits to show on average 30.1% and 78.1% energy-efficiency improvement when compared to the libraries built with the device-upsizing methods and the previous MVT-LM methods under the same yield constraints, respectively.
高功能产率是亚阈值标准电池设计的关键挑战之一。设备放大是一种常用但不理想的方法,因为它在能源和面积上的开销。在本文中,我们提出了一种鲁棒驱动的单元内混合电压设计方法(MVT-ULV),用于鲁棒超低电压操作。在逻辑门的弱拉网络中采用低阈值电压晶体管,增强了鲁棒性。它以最小的能量/面积开销保证了高的功能产率。我们在商用65nm CMOS工艺上证明,我们提出的设计方法在300mV电源电压下比商用库电池和在相同电池面积约束下使用以前的泄漏最小化混合vt方法(MVT-LM)构建的电池分别显示了高达60mV和110mV的鲁棒性提高。此外,在相同的产率约束下,与采用器件放大方法和以前的MVT-LM方法构建的库相比,所提出的MVT-ULV库使ITC'99基准电路的平均能效分别提高了30.1%和78.1%。
{"title":"Robustness-driven energy-efficient ultra-low voltage standard cell design with intra-cell mixed-Vt methodology","authors":"Wenfeng Zhao, Yajun Ha, Chin Hau Hoo, A. Alvarez","doi":"10.1109/ISLPED.2013.6629317","DOIUrl":"https://doi.org/10.1109/ISLPED.2013.6629317","url":null,"abstract":"High functional yield is one of the key challenges for subthreshold standard cell designs. Device upsizing is a commonly used but suboptimal method due to its overheads in energy and area. In this paper, we propose a robustness-driven intra-cell mixed-Vt design methodology (MVT-ULV) for the robust ultra-low voltage operation. It uses low threshold voltage transistors in the weak pulling network of logic gates to enhance the robustness. It guarantees the high functional yield with the minimum energy/area overheads. We demonstrate on a commercial 65nm CMOS process that, our proposed design methodology shows up to 60mV and 110mV robustness improvement at 300mV power supply voltage over the commercial library cells and the cells built with previous Leakage-Minimization mixed-Vt methods (MVT-LM) under the same cell area constraints, respectively. In addition, the proposed MVT-ULV library enables ITC'99 benchmark circuits to show on average 30.1% and 78.1% energy-efficiency improvement when compared to the libraries built with the device-upsizing methods and the previous MVT-LM methods under the same yield constraints, respectively.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81655411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A hybrid display frame buffer architecture for energy efficient display subsystems 一种用于节能显示子系统的混合显示帧缓冲结构
Kyungtae Han, Alexander W. Min, Nithyananda S. Jeganathan, Paul Diefenbaugh
Our principal motivation is to reduce the energy consumption of display subsystems in mobile devices by introducing a hybrid frame buffer architecture into the platform. We observed that display contents on a screen are quite static for certain mobile workloads, such as web browsing. As a result, data reading from the display frame is much more frequent than the writing of new data onto the frame buffer, a state we refer to as read dominance. Based on this observation, we propose a hybrid frame buffer architecture that exploits the display contents' read-dominant property to improve the energy efficiency of display subsystems. Specifically, we employ two memory types: DRAM and Phase-Change Memory (PCM), in the display frame buffer to exploit their different read/write energy characteristics. We also present an analysis of the energy efficiency of the hybrid frame buffer based on our display content and energy consumption models. Our evaluation results show that the proposed hybrid frame buffer reduces frame buffer energy consumption by up to 43%, compared to the conventional DRAM-only frame buffer.
我们的主要动机是通过在平台中引入混合帧缓冲架构来减少移动设备中显示子系统的能耗。我们观察到,对于某些移动工作负载,例如网页浏览,屏幕上的显示内容是相当静态的。因此,从显示帧读取数据的频率要比向帧缓冲区写入新数据的频率高得多,我们将这种状态称为读主导。基于这一观察,我们提出了一种混合帧缓冲架构,利用显示内容的读主导特性来提高显示子系统的能源效率。具体来说,我们在显示帧缓冲区中采用了两种存储器类型:DRAM和相变存储器(PCM),以利用它们不同的读/写能量特性。我们也提出了基于我们的显示内容和能源消耗模型的混合帧缓冲器的能源效率分析。我们的评估结果表明,与传统的只有dram的帧缓冲器相比,所提出的混合帧缓冲器可将帧缓冲器的能耗降低高达43%。
{"title":"A hybrid display frame buffer architecture for energy efficient display subsystems","authors":"Kyungtae Han, Alexander W. Min, Nithyananda S. Jeganathan, Paul Diefenbaugh","doi":"10.1109/ISLPED.2013.6629321","DOIUrl":"https://doi.org/10.1109/ISLPED.2013.6629321","url":null,"abstract":"Our principal motivation is to reduce the energy consumption of display subsystems in mobile devices by introducing a hybrid frame buffer architecture into the platform. We observed that display contents on a screen are quite static for certain mobile workloads, such as web browsing. As a result, data reading from the display frame is much more frequent than the writing of new data onto the frame buffer, a state we refer to as read dominance. Based on this observation, we propose a hybrid frame buffer architecture that exploits the display contents' read-dominant property to improve the energy efficiency of display subsystems. Specifically, we employ two memory types: DRAM and Phase-Change Memory (PCM), in the display frame buffer to exploit their different read/write energy characteristics. We also present an analysis of the energy efficiency of the hybrid frame buffer based on our display content and energy consumption models. Our evaluation results show that the proposed hybrid frame buffer reduces frame buffer energy consumption by up to 43%, compared to the conventional DRAM-only frame buffer.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85377475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1