首页 > 最新文献

2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)最新文献

英文 中文
System-level application-aware dynamic power management in adaptive pipelined MPSoCs for multimedia 多媒体自适应流水线mpsoc的系统级应用感知动态电源管理
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105394
Haris Javaid, M. Shafique, J. Henkel, S. Parameswaran
System-level dynamic power management (DPM) schemes in Multiprocessor System on Chips (MPSoCs) exploit the idleness of processors to reduce the energy consumption by putting idle processors to low-power states. In the presence of multiple low-power states, the challenge is to predict the duration of the idle period with high accuracy so that the most beneficial power state can be selected for the idle processor. In this work, we propose a novel dynamic power management scheme for adaptive pipelined MPSoCs, suitable for multimedia applications. We leverage application knowledge in the form of future workload prediction to forecast the duration of idle periods. The predicted duration is then used to select an appropriate power state for the idle processor. We proposed five heuristics as part of the DPM and compared their effectiveness using an MPSoC implementation of the H.264 video encoder supporting HD720p at 30 fps. The results show that one of the application prediction based heuristic (MAMAPBH) predicted the most beneficial power states for idle processors with less than 3% error when compared to an optimal solution. In terms of energy savings, MAMAPBH was always within 1% of the energy savings of the optimal solution. When compared with a naive approach (where only one of the possible power states is used for all the idle processors), MAMAPBH achieved up to 40% more energy savings with only 0.5% degradation in throughput. These results signify the importance of leveraging application knowledge at system-level for dynamic power management schemes.
多处理器片上系统(mpsoc)中的系统级动态电源管理(DPM)方案利用处理器的空闲状态,将空闲的处理器置于低功耗状态,从而降低能耗。在存在多个低功耗状态的情况下,挑战在于如何高精度地预测空闲时间的持续时间,以便为空闲处理器选择最有利的功耗状态。在这项工作中,我们提出了一种适用于多媒体应用的自适应流水线mpsoc动态电源管理方案。我们以未来工作负载预测的形式利用应用程序知识来预测空闲期的持续时间。然后使用预测的持续时间为空闲处理器选择适当的电源状态。我们提出了五种启发式方法作为DPM的一部分,并使用MPSoC实现支持HD720p的30 fps的H.264视频编码器来比较它们的有效性。结果表明,与最优解相比,基于应用程序预测的启发式算法(MAMAPBH)预测空闲处理器最有利的功率状态误差小于3%。在节能方面,MAMAPBH的节能效果始终在最优方案的1%以内。与一种简单的方法(所有空闲处理器只使用一种可能的电源状态)相比,MAMAPBH实现了高达40%的节能,而吞吐量仅下降了0.5%。这些结果表明利用系统级应用知识的重要性动态电源管理方案。
{"title":"System-level application-aware dynamic power management in adaptive pipelined MPSoCs for multimedia","authors":"Haris Javaid, M. Shafique, J. Henkel, S. Parameswaran","doi":"10.1109/ICCAD.2011.6105394","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105394","url":null,"abstract":"System-level dynamic power management (DPM) schemes in Multiprocessor System on Chips (MPSoCs) exploit the idleness of processors to reduce the energy consumption by putting idle processors to low-power states. In the presence of multiple low-power states, the challenge is to predict the duration of the idle period with high accuracy so that the most beneficial power state can be selected for the idle processor. In this work, we propose a novel dynamic power management scheme for adaptive pipelined MPSoCs, suitable for multimedia applications. We leverage application knowledge in the form of future workload prediction to forecast the duration of idle periods. The predicted duration is then used to select an appropriate power state for the idle processor. We proposed five heuristics as part of the DPM and compared their effectiveness using an MPSoC implementation of the H.264 video encoder supporting HD720p at 30 fps. The results show that one of the application prediction based heuristic (MAMAPBH) predicted the most beneficial power states for idle processors with less than 3% error when compared to an optimal solution. In terms of energy savings, MAMAPBH was always within 1% of the energy savings of the optimal solution. When compared with a naive approach (where only one of the possible power states is used for all the idle processors), MAMAPBH achieved up to 40% more energy savings with only 0.5% degradation in throughput. These results signify the importance of leveraging application knowledge at system-level for dynamic power management schemes.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75740287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
CIPARSim: Cache intersection property assisted rapid single-pass FIFO cache simulation technique 缓存交集属性辅助快速单次FIFO缓存仿真技术
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105316
M. S. Haque, Jorgen Peddersen, S. Parameswaran
An application's cache miss rate is used in timing analysis, system performance prediction and in deciding the best cache memory for an embedded system to meet tighter constraints. Single-pass simulation allows a designer to find the number of cache misses quickly and accurately on various cache memories. Such single-pass simulation systems have previously relied heavily on cache inclusion properties, which allowed rapid simulation of cache configurations for different applications. Thus far the only inclusion properties discovered were applicable to the Least Recently Used (LRU) replacement policy based caches. However, LRU based caches are rarely implemented in real life due to their circuit complexity at larger cache associativities. Embedded processors typically use a FIFO replacement policy in their caches instead, for which there are no full inclusion properties to exploit. In this paper, for the first time, we introduce a cache property called the “Intersection Property” that helps to reduce single-pass simulation time in a manner similar to inclusion property. An intersection property defines conditions that if met, prove a particular element exists in larger caches, thus avoiding further search time. We have discussed three such intersection properties for caches using the FIFO replacement policy in this paper. A rapid single-pass FIFO cache simulator “CIPARSim” has also been proposed. CIPARSim is the first single-pass simulator dependent on the FIFO cache properties to reduce simulation time significantly. CIPARSim's simulation time was up to 5 times faster (on average 3 times faster) compared to the state of the art single-pass FIFO cache simulator for the cache configurations tested. CIPARSim produces the cache hit and miss rates of an application accurately on various cache configurations. During simulation, CIPARSim's intersection properties alone predict up to 90% (on average 65%) of the total hits, reducing simulation time immensely.
应用程序的缓存缺失率用于时间分析,系统性能预测以及为嵌入式系统决定最佳缓存内存以满足更严格的约束。单次模拟允许设计人员在各种缓存存储器上快速准确地找到缓存丢失的数量。这种单遍模拟系统以前严重依赖于缓存包含属性,这允许快速模拟不同应用的缓存配置。到目前为止,发现的唯一包含属性适用于基于最近最少使用(Least Recently Used, LRU)替换策略的缓存。然而,基于LRU的缓存在现实生活中很少实现,因为它们在较大的缓存关联下的电路复杂性。嵌入式处理器通常在其缓存中使用FIFO替换策略,因此没有完整的包含属性可以利用。在本文中,我们首次引入了一种称为“交集属性”的缓存属性,它有助于以类似于包含属性的方式减少单次通过的模拟时间。交集属性定义了一些条件,如果满足这些条件,就证明某个特定元素存在于较大的缓存中,从而避免进一步的搜索时间。我们在本文中讨论了使用FIFO替换策略的缓存的三个这样的交集属性。提出了一种快速单次FIFO缓存模拟器“CIPARSim”。CIPARSim是第一个依赖于FIFO缓存属性的单遍模拟器,可以显着减少模拟时间。对于所测试的缓存配置,CIPARSim的模拟时间比最先进的单通道FIFO缓存模拟器快5倍(平均快3倍)。CIPARSim可以在不同的缓存配置下准确地生成应用程序的缓存命中率和未命中率。在模拟过程中,仅CIPARSim的交叉属性就可以预测高达90%(平均65%)的总命中,极大地减少了模拟时间。
{"title":"CIPARSim: Cache intersection property assisted rapid single-pass FIFO cache simulation technique","authors":"M. S. Haque, Jorgen Peddersen, S. Parameswaran","doi":"10.1109/ICCAD.2011.6105316","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105316","url":null,"abstract":"An application's cache miss rate is used in timing analysis, system performance prediction and in deciding the best cache memory for an embedded system to meet tighter constraints. Single-pass simulation allows a designer to find the number of cache misses quickly and accurately on various cache memories. Such single-pass simulation systems have previously relied heavily on cache inclusion properties, which allowed rapid simulation of cache configurations for different applications. Thus far the only inclusion properties discovered were applicable to the Least Recently Used (LRU) replacement policy based caches. However, LRU based caches are rarely implemented in real life due to their circuit complexity at larger cache associativities. Embedded processors typically use a FIFO replacement policy in their caches instead, for which there are no full inclusion properties to exploit. In this paper, for the first time, we introduce a cache property called the “Intersection Property” that helps to reduce single-pass simulation time in a manner similar to inclusion property. An intersection property defines conditions that if met, prove a particular element exists in larger caches, thus avoiding further search time. We have discussed three such intersection properties for caches using the FIFO replacement policy in this paper. A rapid single-pass FIFO cache simulator “CIPARSim” has also been proposed. CIPARSim is the first single-pass simulator dependent on the FIFO cache properties to reduce simulation time significantly. CIPARSim's simulation time was up to 5 times faster (on average 3 times faster) compared to the state of the art single-pass FIFO cache simulator for the cache configurations tested. CIPARSim produces the cache hit and miss rates of an application accurately on various cache configurations. During simulation, CIPARSim's intersection properties alone predict up to 90% (on average 65%) of the total hits, reducing simulation time immensely.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74819565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Power grid analysis with hierarchical support graphs 基于分层支持图的电网分析
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105383
Xueqian Zhao, Jia Wang, Zhuo Feng, Shiyan Hu
It is increasingly challenging to analyze present day large-scale power delivery networks (PDNs) due to the drastically growing complexity in power grid design. To achieve greater runtime and memory efficiencies, a variety of preconditioned iterative algorithms has been investigated in the past few decades with promising performance, while incremental power grid analysis also becomes popular to facilitate fast re-simulations of corrected designs. Although existing preconditioned solvers, such as incomplete matrix factor-based preconditioners, usually exhibit high efficiency in memory usage, their convergence behaviors are not always satisfactory. In this work, we present a novel hierarchical support-graph preconditioned iterative algorithm that constructs preconditioners by generating spanning trees in power supply networks for fast power grid analysis. The support-graph preconditioner is efficient for handling complex power grid structures (regular or irregular grids), and can facilitate very fast incremental analysis. Our experimental results on IBM power grid benchmarks show that compared with the best direct or iterative solvers, the proposed support-graph preconditioned iterative solver achieves up to 3.6X speedups for DC analysis, and up to 22X speedups for incremental analysis, while reducing the memory consumption by a factor of four.
由于电网设计的复杂性急剧增加,对当今大型输电网络(pdn)的分析越来越具有挑战性。为了实现更高的运行时和内存效率,在过去的几十年里,人们研究了各种各样的预置迭代算法,这些算法的性能都很有希望,而增量电网分析也变得流行,以促进对修正设计的快速重新模拟。虽然现有的预条件解算器(如基于不完全矩阵因子的预条件解算器)通常具有较高的内存利用率,但其收敛行为并不总是令人满意。在这项工作中,我们提出了一种新的分层支持图预置迭代算法,该算法通过在供电网络中生成生成树来构建预置器,用于快速电网分析。支持图预调节器对于处理复杂的电网结构(规则或不规则电网)是有效的,并且可以促进非常快速的增量分析。我们在IBM电网基准测试上的实验结果表明,与最佳的直接或迭代求解器相比,所提出的支持图预置迭代求解器在直流分析中实现了高达3.6倍的加速提升,在增量分析中实现了高达22X的加速提升,同时将内存消耗降低了四倍。
{"title":"Power grid analysis with hierarchical support graphs","authors":"Xueqian Zhao, Jia Wang, Zhuo Feng, Shiyan Hu","doi":"10.1109/ICCAD.2011.6105383","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105383","url":null,"abstract":"It is increasingly challenging to analyze present day large-scale power delivery networks (PDNs) due to the drastically growing complexity in power grid design. To achieve greater runtime and memory efficiencies, a variety of preconditioned iterative algorithms has been investigated in the past few decades with promising performance, while incremental power grid analysis also becomes popular to facilitate fast re-simulations of corrected designs. Although existing preconditioned solvers, such as incomplete matrix factor-based preconditioners, usually exhibit high efficiency in memory usage, their convergence behaviors are not always satisfactory. In this work, we present a novel hierarchical support-graph preconditioned iterative algorithm that constructs preconditioners by generating spanning trees in power supply networks for fast power grid analysis. The support-graph preconditioner is efficient for handling complex power grid structures (regular or irregular grids), and can facilitate very fast incremental analysis. Our experimental results on IBM power grid benchmarks show that compared with the best direct or iterative solvers, the proposed support-graph preconditioned iterative solver achieves up to 3.6X speedups for DC analysis, and up to 22X speedups for incremental analysis, while reducing the memory consumption by a factor of four.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81620021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Modeling the computational efficiency of 2-D and 3-D silicon processors for early-chip planning 基于早期芯片规划的二维和三维硅处理器计算效率建模
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105347
M. Grange, A. Jantsch, R. Weerasekera, D. Pamunuwa
Hierarchical models from physical to system-level are proposed for architectural exploration of high-performance silicon systems to quantify the performance and cost trade offs for 2-D and 3-D IC implementations. We show that 3-D systems can reduce interconnect delay and energy by up to an order of magnitude over 2-D, with an increase of 20–30% in performance-per-watt for every doubling of stack height. Contrary to previous analysis, the improved energy efficiency is achievable at a favorable cost. The models are packaged as a standalone tool and can provide fast estimation of coarse-grain performance and cost limitations for a variety of processing systems to be used at the early chip-planning phase of the design cycle.
提出了从物理到系统级的分层模型,用于高性能硅系统的架构探索,以量化二维和三维集成电路实现的性能和成本权衡。我们表明,3-D系统可以将互连延迟和能量降低到2-D的一个数量级,堆栈高度每增加一倍,每瓦性能增加20-30%。与之前的分析相反,提高能源效率是可以在一个有利的成本。这些模型被打包为一个独立的工具,可以为设计周期的早期芯片规划阶段使用的各种处理系统提供粗粒度性能和成本限制的快速估计。
{"title":"Modeling the computational efficiency of 2-D and 3-D silicon processors for early-chip planning","authors":"M. Grange, A. Jantsch, R. Weerasekera, D. Pamunuwa","doi":"10.1109/ICCAD.2011.6105347","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105347","url":null,"abstract":"Hierarchical models from physical to system-level are proposed for architectural exploration of high-performance silicon systems to quantify the performance and cost trade offs for 2-D and 3-D IC implementations. We show that 3-D systems can reduce interconnect delay and energy by up to an order of magnitude over 2-D, with an increase of 20–30% in performance-per-watt for every doubling of stack height. Contrary to previous analysis, the improved energy efficiency is achievable at a favorable cost. The models are packaged as a standalone tool and can provide fast estimation of coarse-grain performance and cost limitations for a variety of processing systems to be used at the early chip-planning phase of the design cycle.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84402306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
MACACO: Modeling and analysis of circuits for approximate computing 近似计算电路的建模与分析
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105401
Rangharajan Venkatesan, A. Agarwal, K. Roy, A. Raghunathan
Approximate computing, which refers to a class of techniques that relax the requirement of exact equivalence between the specification and implementation of a computing system, has attracted significant interest in recent years. We propose a systematic methodology, called MACACO, for the Modeling and Analysis of Circuits for Approximate Computing. The proposed methodology can be utilized to analyze how an approximate circuit behaves with reference to a conventional correct implementation, by computing metrics such as worst-case error, average-case error, error probability, and error distribution. The methodology applies to both timing-induced approximations such as voltage over-scaling or over-clocking, and functional approximations based on logic complexity reduction. The first step in MACACO is the construction of an equivalent untimed circuit that represents the behavior of the approximate circuit at a given voltage and clock period. Next, we construct a virtual error circuit that represents the error in the approximate circuit's output for any given input or input sequence. Finally, we apply conventional Boolean analysis techniques (SAT solvers, BDDs) and statistical techniques (Monte-Carlo simulation) in order to compute the various metrics of interest. We have applied the proposed methodology to analyze a range of approximate designs for datapath building blocks. Our results show that MACACO can help a designer to systematically evaluate the impact of approximate circuits, and to choose between different approximate implementations, thereby facilitating the adoption of such circuits for approximate computing.
近似计算(Approximate computing)近年来引起了人们极大的兴趣,它指的是一类技术,它放宽了对计算系统的规范和实现之间精确等价的要求。我们提出了一种系统的方法,称为MACACO,用于近似计算电路的建模和分析。所提出的方法可以用来分析近似电路的行为如何参考传统的正确实现,通过计算指标,如最坏情况误差,平均情况误差,误差概率和误差分布。该方法既适用于时间诱导的近似,如电压过标度或过时钟,也适用于基于逻辑复杂性降低的功能近似。MACACO的第一步是构造一个等效的非定时电路,表示近似电路在给定电压和时钟周期下的行为。接下来,我们构造一个虚拟误差电路,表示任意给定输入或输入序列的近似电路输出中的误差。最后,我们应用传统的布尔分析技术(SAT求解器,bdd)和统计技术(蒙特卡罗模拟)来计算各种感兴趣的指标。我们已经应用提出的方法来分析数据路径构建块的一系列近似设计。我们的研究结果表明,MACACO可以帮助设计人员系统地评估近似电路的影响,并在不同的近似实现之间进行选择,从而促进采用此类电路进行近似计算。
{"title":"MACACO: Modeling and analysis of circuits for approximate computing","authors":"Rangharajan Venkatesan, A. Agarwal, K. Roy, A. Raghunathan","doi":"10.1109/ICCAD.2011.6105401","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105401","url":null,"abstract":"Approximate computing, which refers to a class of techniques that relax the requirement of exact equivalence between the specification and implementation of a computing system, has attracted significant interest in recent years. We propose a systematic methodology, called MACACO, for the Modeling and Analysis of Circuits for Approximate Computing. The proposed methodology can be utilized to analyze how an approximate circuit behaves with reference to a conventional correct implementation, by computing metrics such as worst-case error, average-case error, error probability, and error distribution. The methodology applies to both timing-induced approximations such as voltage over-scaling or over-clocking, and functional approximations based on logic complexity reduction. The first step in MACACO is the construction of an equivalent untimed circuit that represents the behavior of the approximate circuit at a given voltage and clock period. Next, we construct a virtual error circuit that represents the error in the approximate circuit's output for any given input or input sequence. Finally, we apply conventional Boolean analysis techniques (SAT solvers, BDDs) and statistical techniques (Monte-Carlo simulation) in order to compute the various metrics of interest. We have applied the proposed methodology to analyze a range of approximate designs for datapath building blocks. Our results show that MACACO can help a designer to systematically evaluate the impact of approximate circuits, and to choose between different approximate implementations, thereby facilitating the adoption of such circuits for approximate computing.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84916878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 257
Temperature aware statistical static timing analysis 温度感知统计静态定时分析
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105313
A. Rogachev, Lu Wan, Deming Chen
With technology scaling, the variability of device parameters continues to increase. This impacts both the performance and the temperature profile of the die turning them into a statistical distribution. To the best of our knowledge, no one has considered the impact of the statistical thermal profile during statistical analysis of the propagation delay. We present a statistical static timing analysis (SSTA) tool which considers this interdependence and produces accurate timing estimation. Our average errors for mean and standard deviation are 0.95% and 3.5% respectively when compared against Monte Carlo simulation. This is a significant improvement over SSTA that assumes a deterministic power profile, whose mean and SD errors are 3.7% and 20.9% respectively. However, when considering >90% performance yield, our algorithm's accuracy improvement was not as significant when compared to the deterministic power case. Thus, if one is concerned with the runtime, a reasonable estimate of the performance yield can be obtained by assuming nominal power. Nevertheless, a full statistical analysis is necessary to achieve maximum accuracy.
随着技术的规模化,器件参数的可变性不断增加。这影响了模具的性能和温度分布,使它们成为统计分布。据我们所知,在对传播延迟进行统计分析时,还没有人考虑到统计热剖面的影响。我们提出了一个统计静态时序分析(SSTA)工具,它考虑了这种相互依赖性,并产生了准确的时序估计。与蒙特卡罗模拟相比,我们的平均值和标准差的平均误差分别为0.95%和3.5%。这是对假设确定性功率分布的SSTA的显著改进,其平均值和SD误差分别为3.7%和20.9%。然而,当考虑到>90%的性能良率时,与确定性功率情况相比,我们的算法的精度提高并不显着。因此,如果关注运行时间,可以通过假设标称功率来获得性能收益的合理估计。然而,为了达到最大的准确性,全面的统计分析是必要的。
{"title":"Temperature aware statistical static timing analysis","authors":"A. Rogachev, Lu Wan, Deming Chen","doi":"10.1109/ICCAD.2011.6105313","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105313","url":null,"abstract":"With technology scaling, the variability of device parameters continues to increase. This impacts both the performance and the temperature profile of the die turning them into a statistical distribution. To the best of our knowledge, no one has considered the impact of the statistical thermal profile during statistical analysis of the propagation delay. We present a statistical static timing analysis (SSTA) tool which considers this interdependence and produces accurate timing estimation. Our average errors for mean and standard deviation are 0.95% and 3.5% respectively when compared against Monte Carlo simulation. This is a significant improvement over SSTA that assumes a deterministic power profile, whose mean and SD errors are 3.7% and 20.9% respectively. However, when considering >90% performance yield, our algorithm's accuracy improvement was not as significant when compared to the deterministic power case. Thus, if one is concerned with the runtime, a reasonable estimate of the performance yield can be obtained by assuming nominal power. Nevertheless, a full statistical analysis is necessary to achieve maximum accuracy.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81978624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A robust architecture for post-silicon skew tuning 一个强大的架构后硅倾斜调谐
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105417
Mac Y. C. Kao, Kun-Ting Tsai, Shih-Chieh Chang
Clock skew minimization is important in VLSI design field. Due to the presence of Process, Voltage, and Temperature (PVT) variations, the Post-Silicon Skew Tuning (PST) technique with the ability of tolerating PVT variations has brought a broad discussion. A PST architecture can dynamically minimize the clock skew even after a chip is manufactured. However, testing the variation tolerance ability of a PST architecture is very difficult because the clock skew does not directly affect the functionality of a design. In addition, creating PVT variation in the traditional testing environment is not easy. Unlike most previous works which focus on the implementation and the performance issues of a PST architecture, the objective of this paper is to propose efficient test mechanisms and verify the variation tolerance ability. In addition, we also propose a novel structure to increase the robustness of a PST architecture in case of a manufacturing fault. Our experiment shows that with little overhead, we can achieve robustness.
时钟偏差最小化是VLSI设计领域的重要内容。由于工艺、电压和温度(PVT)变化的存在,具有容忍PVT变化能力的后硅倾斜调谐(PST)技术引起了广泛的讨论。PST架构可以动态地最小化时钟偏差,甚至在芯片制造之后。然而,测试PST架构的容差能力是非常困难的,因为时钟偏差并不直接影响设计的功能。此外,在传统的测试环境中创建PVT变化并不容易。与以往大多数关注PST体系结构的实现和性能问题的工作不同,本文的目标是提出有效的测试机制并验证变异容忍能力。此外,我们还提出了一种新的结构,以增加PST体系结构在制造故障情况下的鲁棒性。我们的实验表明,在很少的开销下,我们可以实现鲁棒性。
{"title":"A robust architecture for post-silicon skew tuning","authors":"Mac Y. C. Kao, Kun-Ting Tsai, Shih-Chieh Chang","doi":"10.1109/ICCAD.2011.6105417","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105417","url":null,"abstract":"Clock skew minimization is important in VLSI design field. Due to the presence of Process, Voltage, and Temperature (PVT) variations, the Post-Silicon Skew Tuning (PST) technique with the ability of tolerating PVT variations has brought a broad discussion. A PST architecture can dynamically minimize the clock skew even after a chip is manufactured. However, testing the variation tolerance ability of a PST architecture is very difficult because the clock skew does not directly affect the functionality of a design. In addition, creating PVT variation in the traditional testing environment is not easy. Unlike most previous works which focus on the implementation and the performance issues of a PST architecture, the objective of this paper is to propose efficient test mechanisms and verify the variation tolerance ability. In addition, we also propose a novel structure to increase the robustness of a PST architecture in case of a manufacturing fault. Our experiment shows that with little overhead, we can achieve robustness.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77219046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Multilevel tree fusion for robust clock networks 鲁棒时钟网络的多层树融合
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105396
Dongjin Lee, I. Markov
Recent improvements in clock-tree and mesh-based topologies maintain a healthy competition between the two. Trees require much smaller capacitance, but meshes are naturally robust against process variation and can accommodate late design changes. Cross-link insertion has been advocated to make trees more robust, but is limited in practice to short distances. In this work we develop a novel non-tree topology that fuses several clock trees to create large-scale redundancy in a clock network. Empirical validation shows that our novel clock-network structure incrementally enhances robustness to satisfy given variation constraints. Our implementation called Contango3.0 produces robust clock networks even for challenging skew limits, without parallel buffering used by other implementations. It also offers a fine trade-off between power and robustness, increasing the capacitance of the initial tree by less than 60%, which results in 2.3× greater power efficiency than mesh structures.
时钟树和基于网格的拓扑结构的最新改进保持了两者之间的良性竞争。树需要更小的电容,但网格对工艺变化具有天然的鲁棒性,可以适应后期的设计更改。交叉链接插入被提倡使树木更健壮,但在实践中仅限于短距离。在这项工作中,我们开发了一种新的非树拓扑,融合了多个时钟树,在时钟网络中创建了大规模冗余。经验验证表明,我们的时钟网络结构增量增强了鲁棒性,以满足给定的变化约束。我们的实现称为Contango3.0,即使在具有挑战性的倾斜限制下也能产生健壮的时钟网络,而不需要其他实现使用的并行缓冲。它还提供了功率和鲁棒性之间的良好权衡,将初始树的电容增加不到60%,从而使功率效率比网格结构高2.3倍。
{"title":"Multilevel tree fusion for robust clock networks","authors":"Dongjin Lee, I. Markov","doi":"10.1109/ICCAD.2011.6105396","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105396","url":null,"abstract":"Recent improvements in clock-tree and mesh-based topologies maintain a healthy competition between the two. Trees require much smaller capacitance, but meshes are naturally robust against process variation and can accommodate late design changes. Cross-link insertion has been advocated to make trees more robust, but is limited in practice to short distances. In this work we develop a novel non-tree topology that fuses several clock trees to create large-scale redundancy in a clock network. Empirical validation shows that our novel clock-network structure incrementally enhances robustness to satisfy given variation constraints. Our implementation called Contango3.0 produces robust clock networks even for challenging skew limits, without parallel buffering used by other implementations. It also offers a fine trade-off between power and robustness, increasing the capacitance of the initial tree by less than 60%, which results in 2.3× greater power efficiency than mesh structures.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76186912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Implementation of pulsed-latch and pulsed-register circuits to minimize clocking power 实现脉冲锁存器和脉冲寄存器电路,以最小化时钟功率
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105397
Seungwhun Paik, Gi-Joon Nam, Youngsoo Shin
A pulsed-latch can be modeled as a fast flip-flop. This allows conventional flip-flop designs to be migrated to pulsed-latch versions by simple replacement to reduce the clocking power. A key step in the migration process is to insert pulsers, which generate clock pulse to drive local latches; the number of pulsers as well as the wirelength of clock routing must be minimized to reduce the clocking power. We formulate a pulser insertion problem to find a set of latch groups where each group shares a pulser and its load constraint is satisfied; both an ILP formulation and a heuristic algorithm are presented to solve the problem. Experimental results of circuits implemented with 32-nm CMOS technology show that the clocking power of pulsed-latch designs obtained by our approach is 5.9% less than that of greedy approach; this is 44.7% less than that of flip-flop designs. We also consider the problem of pulsed-register where a pulser is integrated with multiple latches. A concept of logical distance is explored during our clustering algorithm to minimize the overhead of signal wirelength when converting flip-flops to pulsed-registers. Compared with flip-flop circuits, signal wirelength is increased by 6.3%, which is 1.4% smaller than without considering logical distance, while reducing the clocking power by 24%.
脉冲锁存器可以建模为一个快速触发器。这使得传统的触发器设计可以通过简单的替换迁移到脉冲锁存器版本,以降低时钟功率。迁移过程的关键步骤是插入脉冲发生器,产生时钟脉冲驱动局部锁存器;为了降低时钟功率,必须尽量减少脉冲数和时钟路由的长度。提出了一个脉冲插入问题,求出一组锁存器组,其中每个锁存器组共享一个脉冲并满足其负载约束;提出了一种求解该问题的启发式算法和ILP公式。用32nm CMOS技术实现的电路实验结果表明,该方法获得的脉冲锁存器设计的时钟功率比贪婪方法低5.9%;这比触发器设计的功耗低44.7%。我们还考虑了脉冲寄存器的问题,其中脉冲发生器集成了多个锁存器。在我们的聚类算法中探索了逻辑距离的概念,以便在将触发器转换为脉冲寄存器时最小化信号长度的开销。与触发器电路相比,信号长度增加了6.3%,比不考虑逻辑距离时增加了1.4%,而时钟功率降低了24%。
{"title":"Implementation of pulsed-latch and pulsed-register circuits to minimize clocking power","authors":"Seungwhun Paik, Gi-Joon Nam, Youngsoo Shin","doi":"10.1109/ICCAD.2011.6105397","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105397","url":null,"abstract":"A pulsed-latch can be modeled as a fast flip-flop. This allows conventional flip-flop designs to be migrated to pulsed-latch versions by simple replacement to reduce the clocking power. A key step in the migration process is to insert pulsers, which generate clock pulse to drive local latches; the number of pulsers as well as the wirelength of clock routing must be minimized to reduce the clocking power. We formulate a pulser insertion problem to find a set of latch groups where each group shares a pulser and its load constraint is satisfied; both an ILP formulation and a heuristic algorithm are presented to solve the problem. Experimental results of circuits implemented with 32-nm CMOS technology show that the clocking power of pulsed-latch designs obtained by our approach is 5.9% less than that of greedy approach; this is 44.7% less than that of flip-flop designs. We also consider the problem of pulsed-register where a pulser is integrated with multiple latches. A concept of logical distance is explored during our clustering algorithm to minimize the overhead of signal wirelength when converting flip-flops to pulsed-registers. Compared with flip-flop circuits, signal wirelength is increased by 6.3%, which is 1.4% smaller than without considering logical distance, while reducing the clocking power by 24%.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88022249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Neuromorphic modeling abstractions and simulation of large-scale cortical networks 大规模皮质网络的神经形态建模抽象与模拟
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105350
J. Krichmar, N. Dutt, J. Nageswaran, Micah Richert
Biological neural systems are well known for their robust and power-efficient operation in highly noisy environments. We outline key modeling abstractions for the brain and focus on spiking neural network models. We discuss aspects of neuronal processing and computational issues related to modeling these processes. Although many of these algorithms can be efficiently realized in specialized hardware, we present a case study of simulation of the visual cortex using a GPU based simulation environment that is readily usable by neuroscientists and computer scientists and efficient enough to construct very large networks comparable to brain networks.
生物神经系统以其在高噪声环境下的鲁棒性和高能效而闻名。我们概述了大脑的关键建模抽象,并重点介绍了尖峰神经网络模型。我们讨论了神经元处理的各个方面以及与这些过程建模相关的计算问题。虽然这些算法中的许多可以在专门的硬件上有效地实现,但我们提出了一个使用基于GPU的模拟环境模拟视觉皮层的案例研究,该环境易于被神经科学家和计算机科学家使用,并且足够有效地构建与大脑网络相当的非常大的网络。
{"title":"Neuromorphic modeling abstractions and simulation of large-scale cortical networks","authors":"J. Krichmar, N. Dutt, J. Nageswaran, Micah Richert","doi":"10.1109/ICCAD.2011.6105350","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105350","url":null,"abstract":"Biological neural systems are well known for their robust and power-efficient operation in highly noisy environments. We outline key modeling abstractions for the brain and focus on spiking neural network models. We discuss aspects of neuronal processing and computational issues related to modeling these processes. Although many of these algorithms can be efficiently realized in specialized hardware, we present a case study of simulation of the visual cortex using a GPU based simulation environment that is readily usable by neuroscientists and computer scientists and efficient enough to construct very large networks comparable to brain networks.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88125723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1