首页 > 最新文献

International Symposium on Quality Electronic Design (ISQED)最新文献

英文 中文
On the selection of adder unit in energy efficient vector processing 论高效矢量处理中加法器的选择
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523602
Ivan Ratković, Oscar Palomar, Milan Stanic, O. Unsal, A. Cristal, M. Valero
Vector processors are a very promising solution for mobile devices and servers due to their inherently energy-efficient way of exploiting data-level parallelism. Previous research on vector architectures predominantly focused on performance, so vector processors require a new design space exploration to achieve low power. In this paper, we present a design space exploration of adder unit for vector processors (VA), as it is one of the crucial components in the core design with a non-negligible impact in overall performance and power. For this interrelated circuit-architecture exploration, we developed a novel framework with both architectural- and circuit-level tools. Our framework includes both design- (e.g. adder's family type) and vector architecture-related parameters (e.g. vector length). Finally, we present guidelines on the selection of the most appropriate VA for different types of vector processors according to different sets of metrics of interest. For example, we found that 2-lane configurations are more EDP (Energy×Delay)-efficient than single lane configurations for low-end mobile processors.
矢量处理器是移动设备和服务器非常有前途的解决方案,因为它们利用数据级并行性的固有节能方式。先前对矢量架构的研究主要集中在性能上,因此矢量处理器需要新的设计空间探索以实现低功耗。在本文中,我们提出了矢量处理器(VA)加法器单元的设计空间探索,因为它是核心设计中的关键组件之一,对整体性能和功耗具有不可忽视的影响。对于这种相互关联的电路架构探索,我们开发了一个具有架构和电路级工具的新框架。我们的框架包括设计参数(例如加法器的族类型)和与向量架构相关的参数(例如向量长度)。最后,我们提出了根据不同的感兴趣的度量集为不同类型的矢量处理器选择最合适的VA的指导方针。例如,我们发现,对于低端移动处理器,2通道配置比单通道配置更具EDP (Energy×Delay)效率。
{"title":"On the selection of adder unit in energy efficient vector processing","authors":"Ivan Ratković, Oscar Palomar, Milan Stanic, O. Unsal, A. Cristal, M. Valero","doi":"10.1109/ISQED.2013.6523602","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523602","url":null,"abstract":"Vector processors are a very promising solution for mobile devices and servers due to their inherently energy-efficient way of exploiting data-level parallelism. Previous research on vector architectures predominantly focused on performance, so vector processors require a new design space exploration to achieve low power. In this paper, we present a design space exploration of adder unit for vector processors (VA), as it is one of the crucial components in the core design with a non-negligible impact in overall performance and power. For this interrelated circuit-architecture exploration, we developed a novel framework with both architectural- and circuit-level tools. Our framework includes both design- (e.g. adder's family type) and vector architecture-related parameters (e.g. vector length). Finally, we present guidelines on the selection of the most appropriate VA for different types of vector processors according to different sets of metrics of interest. For example, we found that 2-lane configurations are more EDP (Energy×Delay)-efficient than single lane configurations for low-end mobile processors.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132502680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A system-level solution for managing spatial temperature gradients in thinned 3D ICs 一种系统级解决方案,用于管理薄化3D集成电路中的空间温度梯度
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523595
A. Annamalai, Raghavan Kumar, Arunkumar Vijayakumar, S. Kundu
As conventional CMOS technology is approaching scaling limits, the shift in trend towards stacked 3D Integrated Circuits (3D IC) is gaining more importance. 3D ICs offer reduced power dissipation, higher integration density, heterogeneous stacking and reduced interconnect delays. In a 3D IC stack, all but the bottom tier are thinned down to enable through-silicon vias (TSV). However, the thinning of the substrate increases the lateral thermal resistance resulting in higher intra-layer temperature gradients potentially leading to performance degradation and even functional errors. In this work, we study the effect of thinning the substrate on temperature profile of various tiers in 3D ICs. Our simulation results show that the intra-layer temperature gradient can be as high as 57°C. Often, the conventional static solutions lead to highly inefficient design. To this end, we present a system-level situation-aware integrated scheme that performs opportunistic thread migration and dynamic voltage and frequency scaling (DVFS) to effectively manage thermal violations while increasing the system throughput relative to stand-alone schemes.
随着传统CMOS技术日益接近规模极限,向堆叠3D集成电路(3D IC)的转变趋势变得越来越重要。3D集成电路提供更低的功耗,更高的集成密度,异构堆叠和更少的互连延迟。在3D集成电路堆栈中,除了底层外,所有层都被减薄以实现硅通孔(TSV)。然而,衬底变薄增加了横向热阻,导致层内温度梯度升高,可能导致性能下降甚至功能错误。在这项工作中,我们研究了衬底减薄对三维集成电路中各层温度分布的影响。我们的模拟结果表明,层内温度梯度可以高达57℃。通常,传统的静态解决方案会导致效率极低的设计。为此,我们提出了一种系统级态势感知集成方案,该方案执行机会性线程迁移和动态电压和频率缩放(DVFS),以有效管理热违规,同时相对于独立方案提高系统吞吐量。
{"title":"A system-level solution for managing spatial temperature gradients in thinned 3D ICs","authors":"A. Annamalai, Raghavan Kumar, Arunkumar Vijayakumar, S. Kundu","doi":"10.1109/ISQED.2013.6523595","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523595","url":null,"abstract":"As conventional CMOS technology is approaching scaling limits, the shift in trend towards stacked 3D Integrated Circuits (3D IC) is gaining more importance. 3D ICs offer reduced power dissipation, higher integration density, heterogeneous stacking and reduced interconnect delays. In a 3D IC stack, all but the bottom tier are thinned down to enable through-silicon vias (TSV). However, the thinning of the substrate increases the lateral thermal resistance resulting in higher intra-layer temperature gradients potentially leading to performance degradation and even functional errors. In this work, we study the effect of thinning the substrate on temperature profile of various tiers in 3D ICs. Our simulation results show that the intra-layer temperature gradient can be as high as 57°C. Often, the conventional static solutions lead to highly inefficient design. To this end, we present a system-level situation-aware integrated scheme that performs opportunistic thread migration and dynamic voltage and frequency scaling (DVFS) to effectively manage thermal violations while increasing the system throughput relative to stand-alone schemes.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133499365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On the interactions between real-time scheduling and inter-thread cached interferences for multicore processors 多核处理器实时调度与线程间缓存干扰的交互研究
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523676
Yiqiang Ding, Wei Zhang
In a multicore platform, the inter-thread cache interferences can significantly affect the worst-case execution time (WCET) of each real-time task, which is crucial for schedulability analysis. At the same time, the worst-case cache interferences are dependent on how tasks are scheduled to run on different cores, thus creating a circular dependence. In this paper, we present an offline real-time scheduling approach on multicore processors by considering the worst-case inter-thread interferences on shared L2 caches. Our scheduling approach uses a greedy heuristic to generate safe schedules while minimizing the worst-case inter-thread shared L2 cache interferences and WCET. The experimental results demonstrate that the proposed approach can reduce the utilization of the resulting schedule by about 12% on average compared to the cyclic multicore scheduling approaches in our theoretical model.
在多核平台中,线程间缓存干扰会显著影响每个实时任务的最坏情况执行时间(WCET),这对可调度性分析至关重要。同时,最坏情况下的缓存干扰取决于任务是如何安排在不同的核心上运行的,从而产生了循环依赖。本文提出了一种基于多核处理器的离线实时调度方法,该方法考虑了共享L2缓存上最坏情况下的线程间干扰。我们的调度方法使用贪婪启发式算法来生成安全调度,同时最小化最坏情况下线程间共享L2缓存干扰和WCET。实验结果表明,与理论模型中的循环多核调度方法相比,该方法可将生成的调度利用率平均降低约12%。
{"title":"On the interactions between real-time scheduling and inter-thread cached interferences for multicore processors","authors":"Yiqiang Ding, Wei Zhang","doi":"10.1109/ISQED.2013.6523676","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523676","url":null,"abstract":"In a multicore platform, the inter-thread cache interferences can significantly affect the worst-case execution time (WCET) of each real-time task, which is crucial for schedulability analysis. At the same time, the worst-case cache interferences are dependent on how tasks are scheduled to run on different cores, thus creating a circular dependence. In this paper, we present an offline real-time scheduling approach on multicore processors by considering the worst-case inter-thread interferences on shared L2 caches. Our scheduling approach uses a greedy heuristic to generate safe schedules while minimizing the worst-case inter-thread shared L2 cache interferences and WCET. The experimental results demonstrate that the proposed approach can reduce the utilization of the resulting schedule by about 12% on average compared to the cyclic multicore scheduling approaches in our theoretical model.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133706965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enabling sizing for enhancing the static noise margins 启用大小以增强静态噪声边界
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523623
Valeriu Beiu, A. Beg, W. Ibrahim, F. Kharbash, M. Alioto
This paper suggests a transistor sizing method for classical CMOS gates implemented in advanced technology nodes and operating at low voltages. The method relies on upsizing the length (L) of all transistors uniformly, and balancing the voltage transfer curves (VTCs) for maximizing the static noise margins (SNMs). We use the most well-known CMOS gates (INV, NAND-2, NOR-2) for introducing the novel sizing method, as well as for validating the concept and evaluating its performances. The results show that sizing has not entirely exhausted its potential, allowing to go beyond the well established delay-power tradeoff, as sizing can increase SNMs by: (i) adjusting the threshold voltages (VTH) and their variations (σVTH); and (ii) balancing the VTCs. Simulation results show that this sizing method enables more reliable (i.e., noise-robust and variation-tolerant) CMOS gates, which could operate correctly at very low supply voltages, hence leading to ultra-low voltage/power circuits.
本文提出了一种在先进技术节点和低电压下实现的经典CMOS栅极晶体管尺寸的方法。该方法通过均匀增大所有晶体管的长度(L)和平衡电压传递曲线(VTCs)来最大化静态噪声裕度(SNMs)。我们使用最著名的CMOS门(INV, NAND-2, NOR-2)来介绍新的尺寸方法,以及验证概念和评估其性能。结果表明,尺寸并没有完全耗尽其潜力,允许超越已经建立的延迟功率权衡,因为尺寸可以通过以下方式增加SNMs:(i)调整阈值电压(VTH)及其变化(σVTH);及(ii)平衡职业训练局。仿真结果表明,这种尺寸方法可以实现更可靠(即噪声鲁棒性和容差性)的CMOS门,可以在非常低的电源电压下正确工作,从而实现超低电压/功率电路。
{"title":"Enabling sizing for enhancing the static noise margins","authors":"Valeriu Beiu, A. Beg, W. Ibrahim, F. Kharbash, M. Alioto","doi":"10.1109/ISQED.2013.6523623","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523623","url":null,"abstract":"This paper suggests a transistor sizing method for classical CMOS gates implemented in advanced technology nodes and operating at low voltages. The method relies on upsizing the length (L) of all transistors uniformly, and balancing the voltage transfer curves (VTCs) for maximizing the static noise margins (SNMs). We use the most well-known CMOS gates (INV, NAND-2, NOR-2) for introducing the novel sizing method, as well as for validating the concept and evaluating its performances. The results show that sizing has not entirely exhausted its potential, allowing to go beyond the well established delay-power tradeoff, as sizing can increase SNMs by: (i) adjusting the threshold voltages (VTH) and their variations (σVTH); and (ii) balancing the VTCs. Simulation results show that this sizing method enables more reliable (i.e., noise-robust and variation-tolerant) CMOS gates, which could operate correctly at very low supply voltages, hence leading to ultra-low voltage/power circuits.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133825702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Energy-aware coarse-grained reconfigurable architectures using dynamically reconfigurable isolation cells 使用动态可重构隔离单元的能量感知的粗粒度可重构架构
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523597
Syed M. A. H. Jafri, Ozan Bag, A. Hemani, Nasim Farahini, K. Paul, J. Plosila, H. Tenhunen
This paper presents a self adaptive architecture to enhance the energy efficiency of coarse-grained reconfigurable architectures (CGRAs). Today, platforms host multiple applications, with arbitrary inter-application communication and concurrency patterns. Each application itself can have multiple versions (implementations with different degree of parallelism) and the optimal version can only be determined at runtime. For such scenarios, traditional worst case designs and compile time mapping decisions are neither optimal nor desirable. Existing solutions to this problem employ costly dedicated hardware to configure the operating point at runtime (using DVFS). As an alternative to dedicated hardware, we propose exploiting the reconfiguration features of modern CGRAs. Our solution relies on dynamically reconfigurable isolation cells (DRICs) and autonomous parallelism, voltage, and frequency selection algorithm (APVFS). The DRICs reduce the overheads of DVFS circuitry by configuring the existing resources as isolation cells. APVFS ensures high efficiency by dynamically selecting the parallelism, voltage and frequency trio, which consumes minimum power to meet the deadlines on available resources. Simulation results using representative applications (Matrix multiplication, FIR, and FFT) showed up to 23% and 51% reduction in power and energy, respectively, compared to traditional DVFS designs. Synthesis results have confirmed significant reduction in area overheads compared to state of the art DVFS methods.
本文提出了一种自适应架构,以提高粗粒度可重构架构(CGRAs)的能效。如今,平台承载着多个应用程序,具有任意的应用程序间通信和并发模式。每个应用程序本身可以有多个版本(具有不同并行度的实现),而最佳版本只能在运行时确定。对于这样的场景,传统的最坏情况设计和编译时映射决策既不是最优的,也不是理想的。针对此问题的现有解决方案使用昂贵的专用硬件在运行时配置操作点(使用DVFS)。作为专用硬件的替代方案,我们建议利用现代CGRAs的重新配置特性。我们的解决方案依赖于动态可重构隔离单元(DRICs)和自主并行、电压和频率选择算法(APVFS)。drc通过将现有资源配置为隔离单元来减少DVFS电路的开销。APVFS通过动态选择并行度、电压和频率三重奏来确保高效率,从而消耗最小的功率以满足可用资源的最后期限。使用代表性应用程序(矩阵乘法、FIR和FFT)的仿真结果显示,与传统的DVFS设计相比,功率和能量分别降低了23%和51%。合成结果证实,与最先进的DVFS方法相比,面积开销显着减少。
{"title":"Energy-aware coarse-grained reconfigurable architectures using dynamically reconfigurable isolation cells","authors":"Syed M. A. H. Jafri, Ozan Bag, A. Hemani, Nasim Farahini, K. Paul, J. Plosila, H. Tenhunen","doi":"10.1109/ISQED.2013.6523597","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523597","url":null,"abstract":"This paper presents a self adaptive architecture to enhance the energy efficiency of coarse-grained reconfigurable architectures (CGRAs). Today, platforms host multiple applications, with arbitrary inter-application communication and concurrency patterns. Each application itself can have multiple versions (implementations with different degree of parallelism) and the optimal version can only be determined at runtime. For such scenarios, traditional worst case designs and compile time mapping decisions are neither optimal nor desirable. Existing solutions to this problem employ costly dedicated hardware to configure the operating point at runtime (using DVFS). As an alternative to dedicated hardware, we propose exploiting the reconfiguration features of modern CGRAs. Our solution relies on dynamically reconfigurable isolation cells (DRICs) and autonomous parallelism, voltage, and frequency selection algorithm (APVFS). The DRICs reduce the overheads of DVFS circuitry by configuring the existing resources as isolation cells. APVFS ensures high efficiency by dynamically selecting the parallelism, voltage and frequency trio, which consumes minimum power to meet the deadlines on available resources. Simulation results using representative applications (Matrix multiplication, FIR, and FFT) showed up to 23% and 51% reduction in power and energy, respectively, compared to traditional DVFS designs. Synthesis results have confirmed significant reduction in area overheads compared to state of the art DVFS methods.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117310185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Effective thermal control techniques for liquid-cooled 3D multi-core processors 液冷3D多核处理器的有效热控制技术
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523583
Yue Hu, Shaoming Chen, Lu Peng, Edward Song, Jin-Woo Choi
Microchannel liquid cooling shows great potential in cooling 3D processors. However, the cooling of 3D processors is limited due to design-time and run-time challenges. Moreover, in new technologies, the processor power density is continually increasing and this will bring more serious challenges to liquid cooling. In this paper, we propose two thermal control techniques: 1) Core Vertically Placed (CVP) technique. According to the architecture of a processor core, two schemes are given for placing a core vertically onto multilayers. The 3D processor with the CVP technique can be better cooled since its separate hotspot blocks have a larger total contact area with the cooler surroundings. 2) Thermoelectric cooling (TEC) technique. We propose to incorporate the TEC technique into the liquid-cooled 3D processor to enhance the cooling of hotspots. Our experiments show the CVP technique reduces the maximum temperature up to 29.58 °C, and 16.64 °C on average compared with the baseline design. Moreover, the TEC technique effectively cools down a hotspot from 96.86 °C to 78.60 °C.
微通道液体冷却在冷却3D处理器方面显示出巨大的潜力。然而,由于设计时和运行时的挑战,3D处理器的冷却受到限制。此外,在新技术中,处理器功率密度不断提高,这将给液冷带来更严峻的挑战。本文提出了两种热控制技术:1)核心垂直放置(CVP)技术。根据处理器核心的结构,给出了两种将核心垂直放置在多层上的方案。采用CVP技术的3D处理器可以更好地冷却,因为其单独的热点块与较冷的环境有更大的总接触面积。2)热电冷却(TEC)技术。我们建议将TEC技术整合到液冷3D处理器中,以增强热点的冷却。实验表明,与基线设计相比,CVP技术可将最高温度降低29.58°C,平均降低16.64°C。此外,TEC技术有效地将热点从96.86°C冷却到78.60°C。
{"title":"Effective thermal control techniques for liquid-cooled 3D multi-core processors","authors":"Yue Hu, Shaoming Chen, Lu Peng, Edward Song, Jin-Woo Choi","doi":"10.1109/ISQED.2013.6523583","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523583","url":null,"abstract":"Microchannel liquid cooling shows great potential in cooling 3D processors. However, the cooling of 3D processors is limited due to design-time and run-time challenges. Moreover, in new technologies, the processor power density is continually increasing and this will bring more serious challenges to liquid cooling. In this paper, we propose two thermal control techniques: 1) Core Vertically Placed (CVP) technique. According to the architecture of a processor core, two schemes are given for placing a core vertically onto multilayers. The 3D processor with the CVP technique can be better cooled since its separate hotspot blocks have a larger total contact area with the cooler surroundings. 2) Thermoelectric cooling (TEC) technique. We propose to incorporate the TEC technique into the liquid-cooled 3D processor to enhance the cooling of hotspots. Our experiments show the CVP technique reduces the maximum temperature up to 29.58 °C, and 16.64 °C on average compared with the baseline design. Moreover, the TEC technique effectively cools down a hotspot from 96.86 °C to 78.60 °C.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114146658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Energy-efficient Spin-Transfer Torque RAM cache exploiting additional all-zero-data flags 节能自旋转移扭矩RAM缓存利用额外的全零数据标志
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523613
Jinwook Jung, Y. Nakata, M. Yoshimoto, H. Kawaguchi
Large on-chip caches account for a considerable fraction of the total energy consumption in modern microprocessors. In this context, emerging Spin-Transfer Torque RAM (STT-RAM) has been regarded as a promising candidate to replace large on-chip SRAM caches in virtue of its nature of the zero leakage. However, large energy requirement of STT-RAM on write operations, resulting in a huge amount of dynamic energy consumption, precludes it from application to on-chip cache designs. In order to reduce the write energy of the STT-RAM cache thereby the total energy consumption, this paper provides an architectural technique which exploits the fact that many applications process a large number of zero data. The proposed design appends additional flags in cache tag arrays and set these additional bits if the corresponding data in the cache line is the zero-valued data in which all data bits are zero. Our experimental results show that the proposed cache design can reduce 73.78% and 69.30% of the dynamic energy on write operations at the byte and word granularities, respectively; total energy consumption reduced by 36.18% and 42.51%, respectively. In addition to the energy reduction, performance evaluation results indicate that the proposed cache improves the processor performance by 5.44% on average.
大型片上缓存在现代微处理器的总能耗中占相当大的一部分。在这种情况下,新兴的自旋转移扭矩RAM (STT-RAM)由于其零泄漏的特性,被认为是取代大型片上SRAM缓存的有希望的候候者。然而,STT-RAM在写操作上的巨大能量需求,导致了大量的动态能量消耗,使其无法应用于片上缓存设计。为了降低STT-RAM缓存的写能量,从而降低总能耗,本文提出了一种利用许多应用程序处理大量零数据的体系结构技术。建议的设计在缓存标签数组中附加额外的标志,如果缓存行中相应的数据是所有数据位都为零的零值数据,则设置这些额外的位。实验结果表明,所提出的缓存设计可以分别在字节和字粒度上减少73.78%和69.30%的写操作动态能量;总能耗分别下降36.18%和42.51%。除了降低能耗外,性能评估结果表明,所提出的缓存将处理器性能平均提高5.44%。
{"title":"Energy-efficient Spin-Transfer Torque RAM cache exploiting additional all-zero-data flags","authors":"Jinwook Jung, Y. Nakata, M. Yoshimoto, H. Kawaguchi","doi":"10.1109/ISQED.2013.6523613","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523613","url":null,"abstract":"Large on-chip caches account for a considerable fraction of the total energy consumption in modern microprocessors. In this context, emerging Spin-Transfer Torque RAM (STT-RAM) has been regarded as a promising candidate to replace large on-chip SRAM caches in virtue of its nature of the zero leakage. However, large energy requirement of STT-RAM on write operations, resulting in a huge amount of dynamic energy consumption, precludes it from application to on-chip cache designs. In order to reduce the write energy of the STT-RAM cache thereby the total energy consumption, this paper provides an architectural technique which exploits the fact that many applications process a large number of zero data. The proposed design appends additional flags in cache tag arrays and set these additional bits if the corresponding data in the cache line is the zero-valued data in which all data bits are zero. Our experimental results show that the proposed cache design can reduce 73.78% and 69.30% of the dynamic energy on write operations at the byte and word granularities, respectively; total energy consumption reduced by 36.18% and 42.51%, respectively. In addition to the energy reduction, performance evaluation results indicate that the proposed cache improves the processor performance by 5.44% on average.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116142411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Hybrid CMOS-TFET based register files for energy-efficient GPGPUs 基于CMOS-TFET的高效节能gpgpu寄存器文件
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523598
Zhi Li, Jingweijia Tan, Xin Fu
State-of-the-art General-Purpose computing on Graphics Processing Unit (GPGPU) is facing severe power challenge due to the increasing number of cores placed on a chip with decreasing feature size. In order to hide the long latency operations, GPGPU employs the fine-grained multi-threading among numerous active threads, leading to the sizeable register files with massive power consumption. Exploring the optimal power savings in register files becomes the critical and first step towards the energy-efficient GPGPUs. The conventional method to reduce dynamic power consumption is the supply voltage scaling, and the inter-bank tunneling FETs (TFETs) are the promising candidates compared to CMOS for low voltage operations regarding to both leakage and performance. However, always executing at the low voltage (so that low frequency) will result in significant performance degradation. In this study, we propose the hybrid CMOS-TFET based register files. To optimize the register power consumption, we allocate TFET-based registers to threads whose execution progress can be delayed to some degree to avoid the memory contentions with other threads, and the CMOS-based registers are still used for threads requiring normal execution speed. Our experimental results show that the proposed technique achieves 30% energy (including both dynamic and leakage) reduction in register files with little performance degradation compared to the baseline case equipped with naive power optimization technique.
图形处理单元(GPGPU)上最先进的通用计算正面临着严峻的功耗挑战,因为在芯片上放置的核心数量越来越多,而特征尺寸却越来越小。为了隐藏长时间的操作延迟,GPGPU在多个活动线程之间采用细粒度多线程,导致大量的寄存器文件和大量的功耗。探索寄存器文件的最优节能成为实现高效节能gpgpu的关键和第一步。降低动态功耗的传统方法是电源电压缩放,与CMOS相比,在泄漏和性能方面,银行间隧道效应管(tfet)是低压操作的有希望的候选者。但是,总是在低电压(以便低频率)下执行将导致显著的性能下降。在这项研究中,我们提出了基于CMOS-TFET的混合寄存器文件。为了优化寄存器功耗,我们将基于tfet的寄存器分配给线程,这些线程的执行进度可以在一定程度上延迟,以避免与其他线程的内存争用,而基于cmos的寄存器仍然用于需要正常执行速度的线程。我们的实验结果表明,与采用朴素功率优化技术的基准情况相比,所提出的技术可以在寄存器文件中减少30%的能量(包括动态和泄漏),并且性能几乎没有下降。
{"title":"Hybrid CMOS-TFET based register files for energy-efficient GPGPUs","authors":"Zhi Li, Jingweijia Tan, Xin Fu","doi":"10.1109/ISQED.2013.6523598","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523598","url":null,"abstract":"State-of-the-art General-Purpose computing on Graphics Processing Unit (GPGPU) is facing severe power challenge due to the increasing number of cores placed on a chip with decreasing feature size. In order to hide the long latency operations, GPGPU employs the fine-grained multi-threading among numerous active threads, leading to the sizeable register files with massive power consumption. Exploring the optimal power savings in register files becomes the critical and first step towards the energy-efficient GPGPUs. The conventional method to reduce dynamic power consumption is the supply voltage scaling, and the inter-bank tunneling FETs (TFETs) are the promising candidates compared to CMOS for low voltage operations regarding to both leakage and performance. However, always executing at the low voltage (so that low frequency) will result in significant performance degradation. In this study, we propose the hybrid CMOS-TFET based register files. To optimize the register power consumption, we allocate TFET-based registers to threads whose execution progress can be delayed to some degree to avoid the memory contentions with other threads, and the CMOS-based registers are still used for threads requiring normal execution speed. Our experimental results show that the proposed technique achieves 30% energy (including both dynamic and leakage) reduction in register files with little performance degradation compared to the baseline case equipped with naive power optimization technique.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116194480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Power Integrity analysis and discrete optimization of decoupling capacitors on high speed power planes by particle swarm optimization 基于粒子群算法的高速功率平面解耦电容器功率完整性分析及离散优化
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523682
J. N. Tripathi, R. Nagpal, N. Chhabra, Rakesh Malik, J. Mukherjee, P. Apte
Power Integrity problem for a high speed power plane is discussed in context of selection and placement of decoupling capacitors. The s-parameters data of power plane geometry and capacitors are used for the accurate analysis including bulk capacitors and VRM, for a real world problem. The optimal capacitors and their optimum locations on the board are found using particle swarm optimization. A novel and accurate methodology is presented which can be used for any high speed Power delivery Network.
从去耦电容的选择和布置等方面讨论了高速电源平面的功率完整性问题。针对实际问题,利用功率平面几何和电容的s参数数据进行精确分析,包括本体电容和VRM。利用粒子群算法找到了最优电容器及其在电路板上的最优位置。提出了一种新颖、准确的方法,可用于任何高速输电网络。
{"title":"Power Integrity analysis and discrete optimization of decoupling capacitors on high speed power planes by particle swarm optimization","authors":"J. N. Tripathi, R. Nagpal, N. Chhabra, Rakesh Malik, J. Mukherjee, P. Apte","doi":"10.1109/ISQED.2013.6523682","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523682","url":null,"abstract":"Power Integrity problem for a high speed power plane is discussed in context of selection and placement of decoupling capacitors. The s-parameters data of power plane geometry and capacitors are used for the accurate analysis including bulk capacitors and VRM, for a real world problem. The optimal capacitors and their optimum locations on the board are found using particle swarm optimization. A novel and accurate methodology is presented which can be used for any high speed Power delivery Network.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122207833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Hierarchical dynamic power management using model-free reinforcement learning 使用无模型强化学习的分层动态电源管理
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523606
Yanzhi Wang, Maryam Triki, X. Lin, A. Ammari, Massoud Pedram
Model-free reinforcement learning (RL) has become a promising technique for designing a robust dynamic power management (DPM) framework that can cope with variations and uncertainties that emanate from hardware and application characteristics. Moreover, the potentially significant benefit of performing application-level scheduling as part of the system-level power management should be harnessed. This paper presents an architecture for hierarchical DPM in an embedded system composed of a processor chip and connected I/O devices (which are called system components.) The goal is to facilitate saving in the system component power consumption, which tends to dominate the total power consumption. The proposed (online) adaptive DPM technique consists of two layers: an RL-based component-level local power manager (LPM) and a system-level global power manager (GPM). The LPM performs component power and latency optimization. It employs temporal difference learning on semi-Markov decision process (SMDP) for model-free RL, and it is specifically optimized for an environment in which multiple (heterogeneous) types of applications can run in the embedded system. The GPM interacts with the CPU scheduler to perform effective application-level scheduling, thereby, enabling the LPM to do even more component power optimizations. In this hierarchical DPM framework, power and latency tradeoffs of each type of application can be precisely controlled based on a user-defined parameter. Experiments show that the amount of average power saving is up to 31.1% compared to existing approaches.
无模型强化学习(RL)已经成为一种很有前途的技术,用于设计健壮的动态电源管理(DPM)框架,该框架可以应对硬件和应用特性产生的变化和不确定性。此外,应该利用作为系统级电源管理一部分执行应用程序级调度的潜在显著好处。本文提出了一种由处理器芯片和连接的I/O设备(称为系统组件)组成的嵌入式系统中的分层DPM体系结构。其目标是促进节省系统组件功耗,这往往占主导地位的总功耗。提出的(在线)自适应DPM技术包括两层:基于rl的组件级本地电源管理器(LPM)和系统级全局电源管理器(GPM)。LPM执行组件功耗和延迟优化。它在无模型RL的半马尔可夫决策过程(SMDP)上使用了时间差异学习,并且专门针对嵌入式系统中可以运行多种(异构)类型应用程序的环境进行了优化。GPM与CPU调度器交互以执行有效的应用程序级调度,从而使LPM能够进行更多的组件电源优化。在这个分层DPM框架中,可以根据用户定义的参数精确控制每种应用程序的功耗和延迟权衡。实验表明,与现有方法相比,平均节电量可达31.1%。
{"title":"Hierarchical dynamic power management using model-free reinforcement learning","authors":"Yanzhi Wang, Maryam Triki, X. Lin, A. Ammari, Massoud Pedram","doi":"10.1109/ISQED.2013.6523606","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523606","url":null,"abstract":"Model-free reinforcement learning (RL) has become a promising technique for designing a robust dynamic power management (DPM) framework that can cope with variations and uncertainties that emanate from hardware and application characteristics. Moreover, the potentially significant benefit of performing application-level scheduling as part of the system-level power management should be harnessed. This paper presents an architecture for hierarchical DPM in an embedded system composed of a processor chip and connected I/O devices (which are called system components.) The goal is to facilitate saving in the system component power consumption, which tends to dominate the total power consumption. The proposed (online) adaptive DPM technique consists of two layers: an RL-based component-level local power manager (LPM) and a system-level global power manager (GPM). The LPM performs component power and latency optimization. It employs temporal difference learning on semi-Markov decision process (SMDP) for model-free RL, and it is specifically optimized for an environment in which multiple (heterogeneous) types of applications can run in the embedded system. The GPM interacts with the CPU scheduler to perform effective application-level scheduling, thereby, enabling the LPM to do even more component power optimizations. In this hierarchical DPM framework, power and latency tradeoffs of each type of application can be precisely controlled based on a user-defined parameter. Experiments show that the amount of average power saving is up to 31.1% compared to existing approaches.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131556922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
International Symposium on Quality Electronic Design (ISQED)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1