首页 > 最新文献

2007 25th International Conference on Computer Design最新文献

英文 中文
Voltage drop reduction for on-chip power delivery considering leakage current variations 考虑泄漏电流变化的片上供电电压降降低
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601883
Jeffrey Fan, N. Mi, S. Tan
In this paper, we propose a novel on-chip voltage drop reduction technique for on-chip power delivery networks of VLSI systems in the presence of variational leakage current sources. The new method inserts decoupling capacitors (decaps) into the power grid networks to reduce the voltage fluctuation. The optimization is based on sensitivity-based conjugate gradientmethod and sequence of linear programming approach. Different from existing power grid noise reduction methods, the new approach considers the impacts of inter-die and intra-die variational leakage current sources due to unavoidable process variability during the decap optimization process for the first time. Leakage currents, which although are static in nature typically, can still add to the total voltage drops and dynamic voltage reduction thus must consider the leakage-induced voltage variations. The proposed algorithm exploits the relative constant variations for different decap configurations of power grid circuits to speed up the statistical optimization process. Decaps can be inserted in such a way that the resulting circuits have much higher probability to meet the voltage drop constraints in the presence of leakage current variations. Experimental results demonstrate the effectiveness of the proposed approach and show that the new method has 100X to 1,000X of speedup over the Monte Carlo based statistical decap optimization method.
在本文中,我们提出了一种新的片上压降降低技术,用于存在变漏电流源的超大规模集成电路系统的片上供电网络。该方法通过在电网中插入去耦电容器来减小电压波动。优化方法采用基于灵敏度的共轭梯度法和序列线性规划方法。与现有电网降噪方法不同的是,该方法首次考虑了封装优化过程中不可避免的工艺变异性对模间和模内变漏电流源的影响。泄漏电流虽然通常是静态的,但仍然会增加总电压降和动态电压降低,因此必须考虑泄漏引起的电压变化。该算法利用了电网电路不同电容配置的相对常数变化,加快了统计优化过程。可以这样一种方式插入deccap,使得在所得到的电路在存在漏电流变化的情况下具有更高的概率满足压降约束。实验结果证明了该方法的有效性,并表明该方法比基于蒙特卡罗的统计decap优化方法的速度提高了100到1000倍。
{"title":"Voltage drop reduction for on-chip power delivery considering leakage current variations","authors":"Jeffrey Fan, N. Mi, S. Tan","doi":"10.1109/ICCD.2007.4601883","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601883","url":null,"abstract":"In this paper, we propose a novel on-chip voltage drop reduction technique for on-chip power delivery networks of VLSI systems in the presence of variational leakage current sources. The new method inserts decoupling capacitors (decaps) into the power grid networks to reduce the voltage fluctuation. The optimization is based on sensitivity-based conjugate gradientmethod and sequence of linear programming approach. Different from existing power grid noise reduction methods, the new approach considers the impacts of inter-die and intra-die variational leakage current sources due to unavoidable process variability during the decap optimization process for the first time. Leakage currents, which although are static in nature typically, can still add to the total voltage drops and dynamic voltage reduction thus must consider the leakage-induced voltage variations. The proposed algorithm exploits the relative constant variations for different decap configurations of power grid circuits to speed up the statistical optimization process. Decaps can be inserted in such a way that the resulting circuits have much higher probability to meet the voltage drop constraints in the presence of leakage current variations. Experimental results demonstrate the effectiveness of the proposed approach and show that the new method has 100X to 1,000X of speedup over the Monte Carlo based statistical decap optimization method.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73319654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Automatic SystemC TLM generation for custom communication platforms 自定义通信平台的自动SystemC TLM生成
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601878
Lochi Yu, S. Abdi
This paper presents a tool for automatic generation of transaction level models (TLMs) in SystemC for MPSoC designs with custom communication platforms. The MPSoC platform is captured as a graphical net-list of components, busses and bridge elements. The application is captured as C processes mapped to the platform components. Once the platform is decided, a set of transaction level communication APIs is automatically generated for each application C process. After the C code is input, an executable SystemC TLM of the design is automatically generated using our tool. This TLM can be executed using standard SystemC simulators for early functional verification of the design. Although, several TLM styles and standards have been proposed in the past, our approach differs in the fact that the designers do not need to understand the underlying SystemC code or TLM modeling style to verify that their application executes on the selected platform. Another key advantage of our tool is that the platform can be easily customized for the application and a new TLM for that platform can be automatically generated. The TLM can be used to program the custom platform early in the design cycle before the components are available. Our experimental results demonstrate that for large industrial applications such as MP3 decoder and H.264, high-speed TLMs can be generated for several platforms in a few seconds.
本文提出了一种在SystemC中自动生成事务级模型(tlm)的工具,用于具有自定义通信平台的MPSoC设计。MPSoC平台被捕获为组件、总线和桥接元件的图形网络列表。应用程序被捕获为映射到平台组件的C进程。一旦确定了平台,就会为每个应用程序C进程自动生成一组事务级通信api。输入C代码后,使用我们的工具自动生成设计的可执行SystemC TLM。该TLM可以使用标准的SystemC模拟器执行,以便对设计进行早期功能验证。尽管过去已经提出了几种TLM风格和标准,但我们的方法不同之处在于,设计人员不需要了解底层SystemC代码或TLM建模风格,就可以验证他们的应用程序在选定的平台上执行。我们的工具的另一个关键优势是,可以很容易地为应用程序定制平台,并且可以自动生成该平台的新TLM。在组件可用之前,TLM可用于在设计周期的早期对定制平台进行编程。实验结果表明,对于MP3解码器和H.264等大型工业应用,可以在几秒钟内为多个平台生成高速tlm。
{"title":"Automatic SystemC TLM generation for custom communication platforms","authors":"Lochi Yu, S. Abdi","doi":"10.1109/ICCD.2007.4601878","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601878","url":null,"abstract":"This paper presents a tool for automatic generation of transaction level models (TLMs) in SystemC for MPSoC designs with custom communication platforms. The MPSoC platform is captured as a graphical net-list of components, busses and bridge elements. The application is captured as C processes mapped to the platform components. Once the platform is decided, a set of transaction level communication APIs is automatically generated for each application C process. After the C code is input, an executable SystemC TLM of the design is automatically generated using our tool. This TLM can be executed using standard SystemC simulators for early functional verification of the design. Although, several TLM styles and standards have been proposed in the past, our approach differs in the fact that the designers do not need to understand the underlying SystemC code or TLM modeling style to verify that their application executes on the selected platform. Another key advantage of our tool is that the platform can be easily customized for the application and a new TLM for that platform can be automatically generated. The TLM can be used to program the custom platform early in the design cycle before the components are available. Our experimental results demonstrate that for large industrial applications such as MP3 decoder and H.264, high-speed TLMs can be generated for several platforms in a few seconds.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91538619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
System level power estimation methodology with H.264 decoder prediction IP case study 系统级功率估计方法与H.264解码器预测IP的案例研究
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601959
Young-Hwan Park, S. Pasricha, F. Kurdahi, N. Dutt
This paper presents a methodology to generate a hierarchy of power models for power estimation of custom hardware IP blocks, enabling a trade-off between power estimation accuracy, modeling effort and estimation speed. Our power estimation approach enables several novel system-level explorations - such as observing the effect of clock gating, and the effects of tweaking application-level parameters on system power - with an estimation accuracy that is close to the gate-level. We implemented our methodology on an H.264 video decoder prediction IP case study, created power models, and evaluated the effects of varying design parameters (e.g., clock gating, IIP frame ratios, quantization), allowing rapid system-level power exploration of these design parameters.
本文提出了一种生成自定义硬件IP块功率估计的功率模型层次结构的方法,实现了功率估计精度、建模工作量和估计速度之间的权衡。我们的功率估计方法实现了几个新颖的系统级探索-例如观察时钟门控的影响,以及调整应用级参数对系统功率的影响-估计精度接近门级。我们在H.264视频解码器预测IP案例研究中实施了我们的方法,创建了功率模型,并评估了不同设计参数(例如,时钟门控,IIP帧比,量化)的影响,从而允许对这些设计参数进行快速的系统级功率探索。
{"title":"System level power estimation methodology with H.264 decoder prediction IP case study","authors":"Young-Hwan Park, S. Pasricha, F. Kurdahi, N. Dutt","doi":"10.1109/ICCD.2007.4601959","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601959","url":null,"abstract":"This paper presents a methodology to generate a hierarchy of power models for power estimation of custom hardware IP blocks, enabling a trade-off between power estimation accuracy, modeling effort and estimation speed. Our power estimation approach enables several novel system-level explorations - such as observing the effect of clock gating, and the effects of tweaking application-level parameters on system power - with an estimation accuracy that is close to the gate-level. We implemented our methodology on an H.264 video decoder prediction IP case study, created power models, and evaluated the effects of varying design parameters (e.g., clock gating, IIP frame ratios, quantization), allowing rapid system-level power exploration of these design parameters.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79066799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Energy-aware co-processor selection for embedded processors on FPGAs fpga上嵌入式处理器的能量感知协处理器选择
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601895
A. H. Gholamipour, E. Bozorgzadeh, Sudarshan Banerjee
In this paper, we present co-processor selection problem for minimum energy consumption in hw/sw co-design on FPGAs with dual power mode. We provide theoretical analysis for the problem under no constraint, resource constraint, and timing constraint. We prove that the complexity of the problem in each case is NP-Hard and we provide a generalized ILP formulation. We compared the result of our approach in minimizing energy to the result of other approaches that had not considered both static and dynamic power during optimization and we showed that we can reduce energy by 63% in some cases.
本文针对双电源模式fpga的软硬件协同设计,提出了功耗最小的协处理器选择问题。对无约束、资源约束和时间约束下的问题进行了理论分析。我们证明了在每种情况下问题的复杂性都是NP-Hard的,并给出了一个广义的ILP公式。我们将我们的方法在最小化能量方面的结果与其他在优化过程中没有考虑静态和动态功率的方法的结果进行了比较,我们表明,在某些情况下,我们可以减少63%的能量。
{"title":"Energy-aware co-processor selection for embedded processors on FPGAs","authors":"A. H. Gholamipour, E. Bozorgzadeh, Sudarshan Banerjee","doi":"10.1109/ICCD.2007.4601895","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601895","url":null,"abstract":"In this paper, we present co-processor selection problem for minimum energy consumption in hw/sw co-design on FPGAs with dual power mode. We provide theoretical analysis for the problem under no constraint, resource constraint, and timing constraint. We prove that the complexity of the problem in each case is NP-Hard and we provide a generalized ILP formulation. We compared the result of our approach in minimizing energy to the result of other approaches that had not considered both static and dynamic power during optimization and we showed that we can reduce energy by 63% in some cases.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79214170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Two-level ata prefetching 两级数据预取
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601908
Fei Gao, Hanyu Cui, S. Sair
Data prefetching has been shown to be an effective tool in hiding part of the latency associated with cache misses in modern processors. Traditionally, data prefetchers fetch data into a small prefetch buffer near the LI for low latency, or the L2 cache for greater coverage and less cache pollution. However, with the L1-L2 cache speed gap growing, significant performance gains can be obtained if the data pref etcher can operate as aggressively as an L2-level pref etcher but with the fast hit times of an LI-level pref etcher. In this paper, we propose a prefetching framework where an LI-level prefetcher and an L2- level prefetcher work cooperatively to reduce the average access time more than either one alone can. We evaluate several design alternatives suited to perform synergistically under different workloads. From the insight we gather from this analysis, we propose a confidence-based adaptive prefetcher that can improve prefetch efficiency significantly with judicious use of available bus bandwidth. Our results show that for certain prefetcher combinations, two- level prefetching can achieve the cumulative speedup attained from either prefetcher alone. Furthermore, when compared to other two-level prefetching models, the adaptive design provides similar speedups with appreciably less bus traffic.
数据预取已被证明是一种有效的工具,可以隐藏现代处理器中与缓存丢失相关的部分延迟。传统上,数据预取器将数据提取到LI附近的一个小预取缓冲区中以获得低延迟,或者将数据提取到L2缓存中以获得更大的覆盖范围和更少的缓存污染。然而,随着L1-L2高速缓存速度差距的增大,如果数据预取器可以像l2级预取器一样积极地操作,但具有li级预取器的快速命中时间,则可以获得显著的性能提升。在本文中,我们提出了一个预取框架,其中li级预取器和L2级预取器协同工作,以减少平均访问时间。我们评估了几种适合在不同工作负载下协同执行的设计方案。根据我们从该分析中收集的见解,我们提出了一种基于置信度的自适应预取器,通过明智地使用可用总线带宽,可以显着提高预取效率。我们的结果表明,对于某些预取器组合,两级预取可以获得单独使用任一预取器所获得的累积加速。此外,与其他两级预取模型相比,自适应设计提供了相似的速度,但总线流量明显减少。
{"title":"Two-level ata prefetching","authors":"Fei Gao, Hanyu Cui, S. Sair","doi":"10.1109/ICCD.2007.4601908","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601908","url":null,"abstract":"Data prefetching has been shown to be an effective tool in hiding part of the latency associated with cache misses in modern processors. Traditionally, data prefetchers fetch data into a small prefetch buffer near the LI for low latency, or the L2 cache for greater coverage and less cache pollution. However, with the L1-L2 cache speed gap growing, significant performance gains can be obtained if the data pref etcher can operate as aggressively as an L2-level pref etcher but with the fast hit times of an LI-level pref etcher. In this paper, we propose a prefetching framework where an LI-level prefetcher and an L2- level prefetcher work cooperatively to reduce the average access time more than either one alone can. We evaluate several design alternatives suited to perform synergistically under different workloads. From the insight we gather from this analysis, we propose a confidence-based adaptive prefetcher that can improve prefetch efficiency significantly with judicious use of available bus bandwidth. Our results show that for certain prefetcher combinations, two- level prefetching can achieve the cumulative speedup attained from either prefetcher alone. Furthermore, when compared to other two-level prefetching models, the adaptive design provides similar speedups with appreciably less bus traffic.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77428284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CAP: Criticality analysis for power-efficient speculative multithreading CAP:高能效推测多线程的临界性分析
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601932
James Tuck, Wei Liu, J. Torrellas
While speculative multithreading (SM) on a chip multiprocessor (CMP) has the ability to speed-up hard-to- parallelize applications, the power inefficiency of aggressive speculation is a concern. To improve SMs power effeciency, we note that not all the tasks that are running in a SM environment are equally critical. To leverage this insight, this paper develops a novel, widely-applicable task-criticality model for SM. It also proposes CAP, a novel architecture that builds a task-criticality graph dynamically and uses it to make scheduling decisions in a SM CMP. Experiments with SPECint, SPECfp, and Olden applications show that, in a CMP with one fast core and three slow ones, the E D2 with CAP is, on average, 91-95% of that without. Moreover, it is only 77-91% of the E D2 of a CMP with four fast cores and no CAP. Overall, we argue that scheduling for task criticality is beneficial.
虽然芯片多处理器(CMP)上的推测性多线程(SM)能够加速难以并行化的应用程序,但积极推测的功率低效率是一个问题。为了提高SMs电源效率,我们注意到并非在SMs环境中运行的所有任务都同样重要。为了利用这一见解,本文为SM开发了一个新颖的、广泛适用的任务临界性模型。本文还提出了一种新的体系结构CAP,它可以动态地构建任务关键度图,并用它来制定SM CMP中的调度决策。对SPECint, SPECfp和Olden应用程序的实验表明,在具有一个快核和三个慢核的CMP中,有CAP的D2平均是没有CAP的91-95%。此外,它仅为具有四个快速核心且没有CAP的CMP的77-91%的E D2。总的来说,我们认为任务临界调度是有益的。
{"title":"CAP: Criticality analysis for power-efficient speculative multithreading","authors":"James Tuck, Wei Liu, J. Torrellas","doi":"10.1109/ICCD.2007.4601932","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601932","url":null,"abstract":"While speculative multithreading (SM) on a chip multiprocessor (CMP) has the ability to speed-up hard-to- parallelize applications, the power inefficiency of aggressive speculation is a concern. To improve SMs power effeciency, we note that not all the tasks that are running in a SM environment are equally critical. To leverage this insight, this paper develops a novel, widely-applicable task-criticality model for SM. It also proposes CAP, a novel architecture that builds a task-criticality graph dynamically and uses it to make scheduling decisions in a SM CMP. Experiments with SPECint, SPECfp, and Olden applications show that, in a CMP with one fast core and three slow ones, the E D2 with CAP is, on average, 91-95% of that without. Moreover, it is only 77-91% of the E D2 of a CMP with four fast cores and no CAP. Overall, we argue that scheduling for task criticality is beneficial.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73711647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Power-aware mapping for reconfigurable NoC architectures 可重构NoC架构的功率感知映射
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601933
M. Modarressi, H. Sarbazi-Azad
A core mapping method for reconfigurable network-on-chip (NoC) architectures is presented in this paper. In most of the existing methods, mapping is carried out based on the traffic characteristics of a single application. However, several different applications are implemented and integrated in the modern complex system-on-chips which should be considered by mapping methods. In the proposed method, the reconfiguration (which is achieved by embedding programmable switches between routers of a mesh-based NoC) allows us to dynamically change the network topology in order to adapt it with the running application and optimize the power and performance metrics. The presented network architecture can be configured as an application- specific topology, while it still holds the benefits of the regular NoC topologies such as modularity and predictable electrical properties. The experimental results show that this method can effectively adapt the NoC to the running application and improve the power consumption and performance of the system.
提出了一种可重构片上网络(NoC)体系结构的核心映射方法。在现有的大多数方法中,映射是基于单个应用的流量特征进行的。然而,在现代复杂的片上系统中实现和集成了几种不同的应用,这应该通过映射方法来考虑。在提出的方法中,重构(通过在基于网格的NoC的路由器之间嵌入可编程交换机实现)允许我们动态改变网络拓扑,以适应运行的应用程序并优化功耗和性能指标。所提出的网络体系结构可以配置为特定于应用程序的拓扑,同时它仍然具有常规NoC拓扑的优点,例如模块化和可预测的电气特性。实验结果表明,该方法能有效地使NoC适应运行应用,提高系统功耗和性能。
{"title":"Power-aware mapping for reconfigurable NoC architectures","authors":"M. Modarressi, H. Sarbazi-Azad","doi":"10.1109/ICCD.2007.4601933","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601933","url":null,"abstract":"A core mapping method for reconfigurable network-on-chip (NoC) architectures is presented in this paper. In most of the existing methods, mapping is carried out based on the traffic characteristics of a single application. However, several different applications are implemented and integrated in the modern complex system-on-chips which should be considered by mapping methods. In the proposed method, the reconfiguration (which is achieved by embedding programmable switches between routers of a mesh-based NoC) allows us to dynamically change the network topology in order to adapt it with the running application and optimize the power and performance metrics. The presented network architecture can be configured as an application- specific topology, while it still holds the benefits of the regular NoC topologies such as modularity and predictable electrical properties. The experimental results show that this method can effectively adapt the NoC to the running application and improve the power consumption and performance of the system.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79920797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Reducing leakage power in peripheral circuits of L2 caches 降低L2缓存外围电路漏功率
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601907
H. Homayoun, A. Veidenbaum
Leakage power has grown significantly and is a major challenge in microprocessor design. Leakage is the dominant power component in second-level (L2) caches. This paper presents two architectural techniques to utilize leakage reduction circuits in L2 caches. They primarily target the leakage in the peripheral circuitry of an L2 cache and as such have to be able to cope with longer delays. One technique exploits the fact that processor activity decreases significantly after an L2 cache miss occurs and saves power during L2 miss service time. Two algorithms, a static one and an adaptive one, are proposed for deciding when to apply this leakage reduction technique. Another technique attempts to keep the peripheral circuits in a lower-power state most of the time. The results for SPEC2K benchmarks show that the first technique can achieve a 18 to 22% reduction in L2 power consumption, on average (and up to 63%), depending on the decision algorithm. The second technique can save 25%, on average (and up to 80%). This comes with a negligible 1 to 2% performance impact, on average, depending on the technique used.
泄漏功率已显著增长,是微处理器设计的主要挑战。漏电是二级(L2)缓存中的主要功率成分。本文提出了在L2缓存中利用泄漏减少电路的两种体系结构技术。它们主要针对L2缓存外围电路中的泄漏,因此必须能够应对更长的延迟。一种技术利用了这样一个事实,即处理器活动在L2缓存丢失发生后显著减少,并在L2丢失服务期间节省电力。提出了静态和自适应两种算法来决定何时应用这种泄漏减少技术。另一种技术试图使外围电路大部分时间处于低功耗状态。SPEC2K基准测试的结果表明,根据决策算法的不同,第一种技术平均可以将L2功耗降低18%到22%(最高可达63%)。第二种方法平均可以节省25%(最高可达80%)。根据所使用的技术,这对性能的影响平均可以忽略不计,只有1%到2%。
{"title":"Reducing leakage power in peripheral circuits of L2 caches","authors":"H. Homayoun, A. Veidenbaum","doi":"10.1109/ICCD.2007.4601907","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601907","url":null,"abstract":"Leakage power has grown significantly and is a major challenge in microprocessor design. Leakage is the dominant power component in second-level (L2) caches. This paper presents two architectural techniques to utilize leakage reduction circuits in L2 caches. They primarily target the leakage in the peripheral circuitry of an L2 cache and as such have to be able to cope with longer delays. One technique exploits the fact that processor activity decreases significantly after an L2 cache miss occurs and saves power during L2 miss service time. Two algorithms, a static one and an adaptive one, are proposed for deciding when to apply this leakage reduction technique. Another technique attempts to keep the peripheral circuits in a lower-power state most of the time. The results for SPEC2K benchmarks show that the first technique can achieve a 18 to 22% reduction in L2 power consumption, on average (and up to 63%), depending on the decision algorithm. The second technique can save 25%, on average (and up to 80%). This comes with a negligible 1 to 2% performance impact, on average, depending on the technique used.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90554162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Scan chain design for three-dimensional integrated circuits (3D ICs) 三维集成电路(3D ic)扫描链设计
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601902
Xiaoxia Wu, P. Falkenstern, Yuan Xie
Scan chains are widely used to improve the testability of IC designs. In traditional 2D IC designs, various design techniques on the construction of scan chains have been proposed to facilitate DFT (Design-For-Test). Recently, three-dimensional (3D) technologies have been proposed as a promising solution to continue technology scaling. In this paper, we study the scan chain construction for 3D ICs, examining the impact of 3D technologies on scan chain ordering. Three different 3D scan chain design approaches (namely, VIA3D, MAP3D, and OPT3D) are proposed and compared, with the experimental results for ISCAS89 benchmark circuits. The advantages as well as disadvantages for each approach are discussed. The results show that both MAP3D and VIA3D approaches require no changes of 2D scan chain algorithms, but OPT3D can achieve the best wire length reduction for the scan chain design. The average scan chain wire length of six ISCAS89 benchmarks obtained from OPT3D has 46.0% reduction compared to the 2D scan chain design. To the best of our knowledge, this is the first study on scan chain design for 3D integrated circuits.
扫描链被广泛用于提高集成电路设计的可测试性。在传统的二维集成电路设计中,已经提出了各种关于扫描链构造的设计技术来促进DFT (design - for - test)。最近,三维(3D)技术被提出作为一种有前途的解决方案来继续技术扩展。本文研究了三维集成电路的扫描链结构,考察了三维技术对扫描链排序的影响。提出了三种不同的三维扫描链设计方法(即VIA3D、MAP3D和OPT3D),并与ISCAS89基准电路的实验结果进行了比较。讨论了每种方法的优点和缺点。结果表明,MAP3D和VIA3D方法都不需要改变二维扫描链算法,但OPT3D可以实现扫描链设计的最佳线长缩减。从OPT3D获得的六个ISCAS89基准测试的平均扫描链线长度与2D扫描链设计相比减少了46.0%。据我们所知,这是第一个三维集成电路扫描链设计的研究。
{"title":"Scan chain design for three-dimensional integrated circuits (3D ICs)","authors":"Xiaoxia Wu, P. Falkenstern, Yuan Xie","doi":"10.1109/ICCD.2007.4601902","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601902","url":null,"abstract":"Scan chains are widely used to improve the testability of IC designs. In traditional 2D IC designs, various design techniques on the construction of scan chains have been proposed to facilitate DFT (Design-For-Test). Recently, three-dimensional (3D) technologies have been proposed as a promising solution to continue technology scaling. In this paper, we study the scan chain construction for 3D ICs, examining the impact of 3D technologies on scan chain ordering. Three different 3D scan chain design approaches (namely, VIA3D, MAP3D, and OPT3D) are proposed and compared, with the experimental results for ISCAS89 benchmark circuits. The advantages as well as disadvantages for each approach are discussed. The results show that both MAP3D and VIA3D approaches require no changes of 2D scan chain algorithms, but OPT3D can achieve the best wire length reduction for the scan chain design. The average scan chain wire length of six ISCAS89 benchmarks obtained from OPT3D has 46.0% reduction compared to the 2D scan chain design. To the best of our knowledge, this is the first study on scan chain design for 3D integrated circuits.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86511513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Hybrid resistor/FET-logic demultiplexer architecture design for hybrid CMOS/nanodevice circuits CMOS/纳米器件混合电路的混合电阻/场效应晶体管逻辑解复用器架构设计
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601955
Shu Li, Tong Zhang
Hybrid nanoelectronics are emerging as one viable option to sustain the Moorepsilas Law after the CMOS scaling limit is reached. One main design challenge in hybrid nanoelectronics is the interface (named as demux) between the highly dense nanowires in nanodevice crossbars and relatively coarse microwires in CMOS domain. The prior work on demux design use a single type of devices to realize the demultiplexing function, but hardly provides a satisfactory solution. This work proposes to combine resistor with FET to implement the demux, leading to the so-called hybrid resistor/FET-logic demux. Such hybrid demux architecture can make these two types of devices well complement each other to improve the overall demux design effectiveness. Furthermore, the effects of resistor conductance variability are analyzed and evaluated based on computer simulations.
在CMOS达到尺度限制后,混合纳米电子学正在成为维持摩尔塞拉斯定律的可行选择。混合纳米电子学的一个主要设计挑战是纳米器件交叉棒中高密度纳米线与CMOS领域中相对粗糙的微线之间的界面(称为demux)。以往的解复用设计都是使用单一类型的器件来实现解复用功能,但很难提供令人满意的解决方案。这项工作提出结合电阻与场效应管来实现demux,导致所谓的混合电阻/场效应管逻辑demux。这种混合demux架构可以使这两类器件很好地互补,提高整体demux设计效率。此外,在计算机模拟的基础上,对电阻器电导变化的影响进行了分析和评价。
{"title":"Hybrid resistor/FET-logic demultiplexer architecture design for hybrid CMOS/nanodevice circuits","authors":"Shu Li, Tong Zhang","doi":"10.1109/ICCD.2007.4601955","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601955","url":null,"abstract":"Hybrid nanoelectronics are emerging as one viable option to sustain the Moorepsilas Law after the CMOS scaling limit is reached. One main design challenge in hybrid nanoelectronics is the interface (named as demux) between the highly dense nanowires in nanodevice crossbars and relatively coarse microwires in CMOS domain. The prior work on demux design use a single type of devices to realize the demultiplexing function, but hardly provides a satisfactory solution. This work proposes to combine resistor with FET to implement the demux, leading to the so-called hybrid resistor/FET-logic demux. Such hybrid demux architecture can make these two types of devices well complement each other to improve the overall demux design effectiveness. Furthermore, the effects of resistor conductance variability are analyzed and evaluated based on computer simulations.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83690749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2007 25th International Conference on Computer Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1