2007 25th International Conference on Computer Design最新文献

英文中文

Scan chain design for three-dimensional integrated circuits (3D ICs) 三维集成电路(3D ic)扫描链设计

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601902

Xiaoxia Wu, P. Falkenstern, Yuan Xie

Scan chains are widely used to improve the testability of IC designs. In traditional 2D IC designs, various design techniques on the construction of scan chains have been proposed to facilitate DFT (Design-For-Test). Recently, three-dimensional (3D) technologies have been proposed as a promising solution to continue technology scaling. In this paper, we study the scan chain construction for 3D ICs, examining the impact of 3D technologies on scan chain ordering. Three different 3D scan chain design approaches (namely, VIA3D, MAP3D, and OPT3D) are proposed and compared, with the experimental results for ISCAS89 benchmark circuits. The advantages as well as disadvantages for each approach are discussed. The results show that both MAP3D and VIA3D approaches require no changes of 2D scan chain algorithms, but OPT3D can achieve the best wire length reduction for the scan chain design. The average scan chain wire length of six ISCAS89 benchmarks obtained from OPT3D has 46.0% reduction compared to the 2D scan chain design. To the best of our knowledge, this is the first study on scan chain design for 3D integrated circuits.

扫描链被广泛用于提高集成电路设计的可测试性。在传统的二维集成电路设计中，已经提出了各种关于扫描链构造的设计技术来促进DFT (design - for - test)。最近，三维(3D)技术被提出作为一种有前途的解决方案来继续技术扩展。本文研究了三维集成电路的扫描链结构，考察了三维技术对扫描链排序的影响。提出了三种不同的三维扫描链设计方法(即VIA3D、MAP3D和OPT3D)，并与ISCAS89基准电路的实验结果进行了比较。讨论了每种方法的优点和缺点。结果表明，MAP3D和VIA3D方法都不需要改变二维扫描链算法，但OPT3D可以实现扫描链设计的最佳线长缩减。从OPT3D获得的六个ISCAS89基准测试的平均扫描链线长度与2D扫描链设计相比减少了46.0%。据我们所知，这是第一个三维集成电路扫描链设计的研究。

引用次数: 67

System level power estimation methodology with H.264 decoder prediction IP case study 系统级功率估计方法与H.264解码器预测IP的案例研究

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601959

Young-Hwan Park, S. Pasricha, F. Kurdahi, N. Dutt

This paper presents a methodology to generate a hierarchy of power models for power estimation of custom hardware IP blocks, enabling a trade-off between power estimation accuracy, modeling effort and estimation speed. Our power estimation approach enables several novel system-level explorations - such as observing the effect of clock gating, and the effects of tweaking application-level parameters on system power - with an estimation accuracy that is close to the gate-level. We implemented our methodology on an H.264 video decoder prediction IP case study, created power models, and evaluated the effects of varying design parameters (e.g., clock gating, IIP frame ratios, quantization), allowing rapid system-level power exploration of these design parameters.

本文提出了一种生成自定义硬件IP块功率估计的功率模型层次结构的方法，实现了功率估计精度、建模工作量和估计速度之间的权衡。我们的功率估计方法实现了几个新颖的系统级探索-例如观察时钟门控的影响，以及调整应用级参数对系统功率的影响-估计精度接近门级。我们在H.264视频解码器预测IP案例研究中实施了我们的方法，创建了功率模型，并评估了不同设计参数(例如，时钟门控，IIP帧比，量化)的影响，从而允许对这些设计参数进行快速的系统级功率探索。

引用次数: 14

Transparent mode flip-flops for collapsible pipelines 可折叠管道的透明模式人字拖

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601952

Eric L. Hill, Mikko H. Lipasti

Prior work has shown that collapsible pipelining techniques have the potential to significantly reduce clocking activity, which can consume up to 70% of the dynamic power in modern high performance microprocessors. Previous collapsible pipeline proposals either rely on single phase clocking (by forcing latches into transparent state) or do not discuss the mechanisms by which stages are merged. In this work two flip-flop designs featuring an additional transparent state suitable for collapsing stages are presented. Transparency is achieved either by decoupling the master and slave clocks to keep both latches transparent, or by using a bypass mux that routes around the flip-flop. Both of these designs are evaluated in the context of transparently gated pipelines, an ad-hoc collapsible pipelining technique. Detailed analysis shows that the decoupled clock flip-flop is the most attractive in terms of energy and delay.

先前的研究表明，可折叠的流水线技术有可能显著减少时钟活动，这可能会消耗现代高性能微处理器中高达70%的动态功率。以前的可折叠管道提案要么依赖于单相时钟(通过迫使锁存器进入透明状态)，要么没有讨论阶段合并的机制。在这项工作中，提出了两种具有适合折叠阶段的额外透明状态的触发器设计。通过将主时钟和从时钟解耦以保持两个锁存器透明，或者通过使用绕过触发器的旁路复用器来实现透明度。这两种设计都是在透明门控管道的背景下进行评估的，透明门控管道是一种特殊的可折叠管道技术。详细分析表明，解耦时钟触发器在能量和延迟方面最具吸引力。

引用次数: 4

Benchmarks and performance analysis of decimal floating-point applications 十进制浮点应用程序的基准测试和性能分析

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601896

Liang-Kai Wang, C. Tsen, M. Schulte, Divya Jhalani

The IEEE P754 draft standard for floating-point arithmetic provides specifications for decimal floating-point (DFP) formats and operations. Based on this standard, many developers will provide support for DFP calculations. We present a benchmark suite for DFP applications and use this suite to evaluate the performance of hardware and software DFP solutions. Our benchmarks include banking, commerce, risk-management, tax, and telephone billing applications organized into a suite of five macro benchmarks. In addition to developing our own applications, we leverage open-source projects and academic financial analysis applications. The benchmarks are modular, making them easy to adapt for different DFP solutions. We use the benchmarks to evaluate the performance of the decNumber DFP library and an extended version of the SimpleScalar PISA architecture with hardware and instruction set support for DFP operations. Our analysis shows that providing processor support for high-speed DFP operations significantly improves the performance of DFP applications.

浮点运算的IEEE P754标准草案提供了十进制浮点(DFP)格式和操作的规范。基于这个标准，许多开发人员将提供对DFP计算的支持。我们提出了一个DFP应用程序的基准套件，并使用该套件来评估硬件和软件DFP解决方案的性能。我们的基准测试包括银行、商业、风险管理、税务和电话计费应用程序，这些应用程序被组织成一组五个宏观基准测试。除了开发我们自己的应用程序，我们还利用开源项目和学术财务分析应用程序。基准测试是模块化的，这使得它们很容易适应不同的DFP解决方案。我们使用基准来评估decNumber DFP库和SimpleScalar PISA体系结构的扩展版本的性能，该体系结构具有支持DFP操作的硬件和指令集。我们的分析表明，为高速DFP操作提供处理器支持可以显著提高DFP应用程序的性能。

引用次数: 39

Continual hashing for efficient fine-grain state inconsistency detection 连续散列，实现高效的细粒度状态不一致检测

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601877

Jae W. Lee, Myron King, K. Asanović

Transaction-level modeling (TLM) allows a designer to save functional verification effort during the modular refinement of an SoC by reusing the prior implementation of a module as a golden model for state inconsistency detection. One problem in simulation-based verification is the performance and bandwidth overhead of state dump and comparison between two models. In this paper, we propose an efficient fine-grain state inconsistency detection technique that checks the consistency of two states of arbitrary size at sub- transaction (tick) granularity using incremental hashes. At each tick, the hash generates a signature of the entire state, which can be efficiently updated and compared. We evaluate the proposed signature scheme with a FIR filter and a Vorbis decoder and show that very fine-grain state consistency checking is feasible. The hash signature checking increases execution time of Bluespec RTL simulation by 1.2% for the FIR filter and by 2.2% for the Verbis decoder while correctly detecting any injected state inconsistency.

事务级建模(TLM)通过重用模块的先前实现作为状态不一致检测的黄金模型，允许设计人员在SoC的模块化改进期间节省功能验证工作。基于仿真的验证中存在的一个问题是状态转储的性能和带宽开销以及两种模型之间的比较。在本文中，我们提出了一种有效的细粒度状态不一致检测技术，该技术使用增量哈希在子事务(tick)粒度上检查任意大小的两个状态的一致性。在每个滴答声中，散列生成整个状态的签名，可以有效地更新和比较。我们用FIR滤波器和Vorbis解码器对所提出的签名方案进行了评估，并证明了非常细粒度的状态一致性检查是可行的。哈希签名检查在正确检测任何注入状态不一致的同时，将FIR滤波器的Bluespec RTL模拟的执行时间增加了1.2%，Verbis解码器的执行时间增加了2.2%。

引用次数: 1

An efficient gate delay model for VLSI design VLSI设计中一种有效的门延迟模型

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601938

T. Chiang, C. Y. Chen, Weiyu Chen

Accurate estimation of gate delays is essential for timing-related CAD tools. CAD researchers tend to use Elmore delay model for estimating gate delays. Since Elmore delay model was primarily developed for estimating interconnection delay, when applied to gate delay estimation, there will be significant inaccuracy. In this paper, by embedding concepts of electronic theories into switch-level analysis, a simple and efficient delay model for gates of general types (such as NAND, NOR, and complex gates) is proposed. Experimental data show that the proposed gate delay model consistently achieves high accuracy (typically within around 2% of SPICE simulations).

门延迟的准确估计对于时间相关的CAD工具是必不可少的。CAD研究者倾向于使用Elmore延迟模型来估计门延迟。由于Elmore延迟模型主要是为了估计互连延迟而开发的，当应用于门延迟估计时，会有很大的不准确性。在本文中，通过将电子理论的概念嵌入到开关级分析中，提出了一种简单有效的一般类型门(如NAND, NOR和复杂门)的延迟模型。实验数据表明，所提出的门延迟模型始终达到高精度(通常在SPICE模拟的2%左右)。

引用次数: 5

An automated runtime power-gating scheme 一个自动运行时功率门控方案

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601928

M. Hamada, T. Kitahara, N. Kawabe, Hironori Sato, T. Nishikawa, T. Shimazawa, T. Yamashita, H. Hara, Y. Oowaki

An automated runtime power-gating scheme to reduce the leakage power in the active mode is presented in this paper. We propose a circuit that generates a sleep control signal from a clock-gating control signal automatically. By the combination of selective MT-CMOS scheme, the generated sleep control signal, and a novel flip-flop circuit with an additional latch function, a zero-wait transition from a sleep mode to an active mode is enabled. The additional latch function required for the zero-wait transition is achieved by only 6 transistors in addition to a conventional flip- flop. By the scheme, any design with the clock-gating scheme can be transformed automatically to a power- gated design while keeping the system operation the same in terms of the cycle accuracy. The scheme is applied to an MPEG4/H.264 audio/video codec and 21% power saving is achieved in the active mode while keeping the area overhead only 16% in a 90 nm CMOS design.

本文提出了一种自动运行功率门控方案，以降低有源模式下的泄漏功率。我们提出了一种由时钟门控信号自动产生睡眠控制信号的电路。通过选择性MT-CMOS方案、生成的睡眠控制信号和具有附加锁存功能的新颖触发器电路的组合，实现了从睡眠模式到活动模式的零等待转换。零等待转换所需的额外锁存器功能除了一个传统触发器外，只需要6个晶体管即可实现。通过该方案，任何具有时钟门控方案的设计都可以自动转换为功率门控设计，同时保持系统在周期精度方面的运行相同。该方案适用于MPEG4/H格式。在有源模式下，可实现264音频/视频编解码器和21%的功耗节约，同时在90纳米CMOS设计中仅保持16%的面积开销。

引用次数: 0

Whitespace redistribution for thermal via insertion in 3D stacked ICs 三维堆叠集成电路中热插入的空白再分配

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601912

E. Wong, S. Lim

One of the biggest challenges in 3D stacked IC design is heat dissipation. Incorporating thermal vias is a promising method for reducing the temperatures of 3D ICs. The bonding styles between device layers impose certain restrictions to where thermal vias may be inserted. This paper presents a whitespace redistribution algorithm that takes bonding style into consideration to improve thermal via placement, which in turn reduces temperature.

3D堆叠IC设计的最大挑战之一是散热。结合热通孔是降低3D集成电路温度的一种很有前途的方法。器件层之间的键合方式对热通孔可能插入的位置施加了一定的限制。本文提出了一种考虑键合方式的空白重新分配算法，通过放置来改善散热，从而降低温度。

引用次数: 15

A position-insensitive finished store buffer 位置不敏感的已完成存储缓冲区

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601888

Erika Gunadi, Mikko H. Lipasti

This paper presents the finished store buffer (or FSB), an alternative and position-insensitive approach for building a scalable store buffer for an out-of-order processor. Exploiting the fact that only a small portion of in-flight stores are done executing (i.e. finished) and waiting for retirement, we are able to build a much smaller and more scalable store buffer. Our study shows that we only need at most half of the number of entries in a conventional store queue if we buffer only the stores that have finished execution. Entries in the store buffer are allocated at issue and disallocated on retirement. A clever encoder circuit is used to provide positional searches without an explicitly positional queue structure. While reducing the access latency and power consumption significantly, our technique has virtually no detrimental effect on per-cycle performance (IPC).

本文提出了完成存储缓冲区(或FSB)，这是一种为无序处理器构建可扩展存储缓冲区的替代方法和位置不敏感方法。利用只有一小部分飞行中的商店完成执行(即完成)并等待退役的事实，我们能够构建一个更小且更具可扩展性的商店缓冲区。我们的研究表明，如果我们只缓冲已经完成执行的存储，我们最多只需要传统存储队列中条目数量的一半。存储缓冲区中的条目在发出时分配，在退出时取消分配。使用了一个聪明的编码器电路来提供位置搜索，而不需要明确的位置队列结构。在显著降低访问延迟和功耗的同时，我们的技术对每周期性能(IPC)几乎没有不利影响。

引用次数: 7

An efficient routing method for pseudo-exhaustive built-in self-testing of high-speed interconnects 高速互连伪穷举内置自检的一种有效路由方法

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601925

Jianxun Liu, W. Jone

This paper presents a powerful routing method for pseudo-exhaustive built-in self-testing of high-speed interconnects with both capacitive and inductive crosstalk effects. Based on the concepts of test cone and cut-off locality, the routing method can generate an interconnect structure such that all nets can be tested by pseudoexhaustive patterns. The test pattern generation method is simple and efficient. Experimental results obtained by simulating a set of MCNC benchmarks demonstrate the feasibility of the proposed pseudo-exhaustive test approach and the efficiency of the proposed routing method.

本文提出了一种强大的路由方法，用于具有电容串扰和电感串扰效应的高速互连的伪穷举内置自测试。基于测试锥和截止局域的概念，路由方法可以生成一个互连结构，使得所有网络都可以通过伪穷极模式进行测试。该测试模式生成方法简单有效。仿真一组MCNC基准测试的实验结果证明了所提出的伪穷举测试方法的可行性和路由方法的有效性。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2007 25th International Conference on Computer Design

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀