2008 IEEE International Conference on Computer Design最新文献

英文中文

Ant Colony Optimization directed program abstraction for software bounded model checking 基于蚁群算法的软件有界模型检验程序抽象

2008 IEEE International Conference on Computer Design

Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751839

Xueqi Cheng, M. Hsiao

The increasing complexity and size of software designs has made scalability a major bottleneck in software verification. Program abstraction has shown potential in alleviating this problem through selective search space reduction. In this paper, we propose an Ant Colony Optimization (ACO)-directed program structure construction to formulate a novel under-approximation based program abstraction (UAPA). By taking advantage of the resulting abstraction, a new software bounded model checking framework is built with the aim of improving the performance of property checking, especially for property falsification. Experimental results on various programs showed that the proposed ACO-directed program abstraction can dramatically improve the performance of software bounded model checking with significant speedups.

软件设计的复杂性和规模的增加使得可伸缩性成为软件验证的主要瓶颈。程序抽象已经显示出通过减少选择性搜索空间来缓解这个问题的潜力。在本文中，我们提出了一种蚁群优化(Ant Colony Optimization, ACO)导向的程序结构构建，以形成一种新的基于欠逼近的程序抽象(UAPA)。利用所得到的抽象，建立了一个新的软件有界模型检查框架，以提高属性检查的性能，特别是对属性伪造的检查。在各种程序上的实验结果表明，所提出的以aco为导向的程序抽象可以显著提高软件有界模型检查的性能和显著的速度。

引用次数: 2

Acquiring an exhaustive, continuous and real-time trace from SoCs 从soc获取详尽的、连续的和实时的跟踪

2008 IEEE International Conference on Computer Design

Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751885

C. Hochberger, A. Weiss

The amount of time and resources that have to be spent on debugging of embedded cores continuously increases. Approaches valid 10 years ago can no longer be used due to the variety and complexity of peripheral components of SoC solutions that even might consist of multiple heterogeneous cores. Although there are some initiatives to standardize and leverage the embedded debugging capabilities, current debugging solutions only cover a fraction of the problems present in that area. In this contribution we show a new approach for debugging and tracing SoCs. The new approach, called hidICE (hidden ICE), delivers an exhaustive, continuous and real-time trace with much lower system interference compared to state-of-the-art solutions.

必须花费在嵌入式内核调试上的时间和资源不断增加。由于SoC解决方案的外围组件的多样性和复杂性，甚至可能由多个异构内核组成，因此10年前有效的方法不再适用。尽管有一些计划来标准化和利用嵌入式调试功能，但当前的调试解决方案只涵盖了该领域中存在的问题的一小部分。在本文中，我们将展示一种调试和跟踪soc的新方法。与最先进的解决方案相比，这种被称为hidICE(隐藏ICE)的新方法提供了详尽、连续和实时的跟踪，系统干扰更低。

引用次数: 11

Modeling and reduction of complex timing constraints in high performance digital circuits 高性能数字电路中复杂时序约束的建模与减少

2008 IEEE International Conference on Computer Design

Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751914

V. Nagbhushan, C. Y. Chen

Complex timing constraints that refer to multiple clocks and/or edges are often used in the design of modern high performance processors. Such constraints complicate the design of downstream algorithms such as logic synthesis. The complexity of the overall CAD system can be reduced considerably if we can optimally transform the timing constraints so that they refer only to a single clock and edge. In this paper, we show how to model these multi clock/edge timing constraints and describe algorithms to reduce the number reference clocks/edges. We address the important problems of accurately handling signal transitions, sequential elements, input slope variations and timing overrides, which have not been addressed before.

在现代高性能处理器的设计中经常使用涉及多个时钟和/或边缘的复杂时序约束。这些约束使下游算法(如逻辑合成)的设计复杂化。如果我们能够优化时间约束，使它们只涉及单个时钟和边缘，则可以大大降低整个CAD系统的复杂性。在本文中，我们展示了如何对这些多时钟/边缘定时约束进行建模，并描述了减少参考时钟/边缘数量的算法。我们解决了准确处理信号转换、顺序元素、输入斜率变化和时序覆盖等重要问题，这些问题以前没有解决过。

引用次数: 0

RMA: A Read Miss-Based Spin-Down Algorithm using an NV Cache RMA:使用NV缓存的基于读丢失的Spin-Down算法

2008 IEEE International Conference on Computer Design

Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751910

Hyotaek Shim, Jaegeuk Kim, Dawoon Jung, Jin-Soo Kim, S. Maeng

It is an important issue to reduce the power consumption of a hard disk that takes a large amount of computer systempsilas power. As a new trend, an NV cache is used to make a disk spin down longer by servicing read/write requests instead of the disk. During the spin-down periods, write requests can be simply handled by write buffering, but read requests are still the main cause of initiating spin-ups because of a low hit ratio in the NV cache. Even when there is no user activity, read requests can be frequently generated by running applications and system services, hindering the spin-down. In this paper, we propose new NV cache policies: active write caching to reduce or to delay spin-ups caused by read misses during spin-down periods and a read miss-based spin-down algorithm to extend the spin-down periods, exploiting the NV cache effectively. Our policies reduce the power consumption of a hard disk by up to 50.1% with a 512 MB NV cache, compared with preceding approaches.

硬盘消耗了大量的计算机系统电能，如何降低硬盘的功耗是一个重要的问题。作为一种新的趋势，NV缓存被用来通过服务读/写请求而不是磁盘来延长磁盘的休眠时间。在休眠期间，写请求可以通过写缓冲简单地处理，但是读请求仍然是启动休眠的主要原因，因为NV缓存中的命中率很低。即使在没有用户活动的情况下，运行应用程序和系统服务也可能频繁地生成读请求，从而阻碍了休眠。在本文中，我们提出了新的NV缓存策略:主动写缓存以减少或延迟spin-down期间由读丢失引起的spin-up，以及基于读丢失的spin-down算法以延长spin-down期间，有效地利用NV缓存。在512mb NV缓存的情况下，我们的策略与之前的方法相比，最多可以降低硬盘功耗50.1%。

引用次数: 1

Accelerating search and recognition with a TCAM functional unit 加速搜索和识别与TCAM功能单元

2008 IEEE International Conference on Computer Design

Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751844

Atif Hashmi, Mikko H. Lipasti

World data is increasing rapidly, doubling almost every three years[1][2]. To comprehend and use this data effectively, search and recognition (SR) applications will demand more computational power in the future. The inherent speedups that these applications get due to frequency scaling will no longer exist as processor vendors move away from frequency scaling and towards multi-core architectures. Thus, modifications to both the structure of SR applications and current processor architectures are required to meet the computational needs of these workloads. This paper describes a novel hardware acceleration scheme to improve the performance of SR applications. The hardware accelerator relies on Ternary Content-Addressable Memory and some straightforward ISA extensions to deliver a promising speedup of 3.0-4.0 for SR workloads like Template Matching, BLAST, and multi-threaded applications using Software Transactional Memory (STM).

世界数据增长迅速，几乎每三年翻一番。为了有效地理解和使用这些数据，搜索和识别(SR)应用程序将在未来需要更多的计算能力。随着处理器供应商从频率缩放转向多核架构，这些应用程序由于频率缩放而获得的固有加速将不再存在。因此，需要修改SR应用程序的结构和当前的处理器架构，以满足这些工作负载的计算需求。本文提出了一种新的硬件加速方案来提高SR应用的性能。硬件加速器依赖于三元内容可寻址内存和一些简单的ISA扩展，为模板匹配、BLAST和使用软件事务性内存(STM)的多线程应用程序等SR工作负载提供3.0-4.0的加速。

引用次数: 8

Hierarchical simulation-based verification of Anton, a special-purpose parallel machine 基于分层仿真的Anton专用并行机验证

2008 IEEE International Conference on Computer Design

Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751883

J. P. Grossman, J. Salmon, C. R. Ho, D. Ierardi, Brian Towles, Brannon Batson, Jochen Spengler, Stanley C. Wang, Rolf Mueller, Michael Theobald, C. Young, Joseph Gagliardo, Martin M. Deneroff, R. Dror, D. Shaw

One of the major design verification challenges in the development of Anton, a massively parallel special-purpose machine for molecular dynamics, was to provide evidence that computations spanning more than a quadrillion clock cycles will produce valid scientific results. Our verification methodology addressed this problem by using a hierarchy of RTL, architectural, and numerical simulations. Block- and chip-level RTL models were verified by means of extensive co-simulation with a detailed C++ architectural simulator, ensuring that the RTL models could perform the same molecular dynamics computations as the architectural simulator. The output of the architectural simulator was compared to a parallelized numerical simulator that produces bitwise identical results to Anton, and is fast enough to verify the long-term numerical stability of computations on Anton. These explicit couplings between adjacent levels of the simulation hierarchy created a continuous verification chain from molecular dynamics to individual logic gates.

Anton是一种用于分子动力学的大规模并行专用机器，在开发过程中，主要的设计验证挑战之一是提供证据，证明跨越一千万亿时钟周期的计算将产生有效的科学结果。我们的验证方法通过使用RTL、体系结构和数值模拟的层次结构来解决这个问题。通过与详细的c++体系结构模拟器进行广泛的联合仿真，验证了块级和芯片级RTL模型，确保RTL模型可以执行与体系结构模拟器相同的分子动力学计算。将架构模拟器的输出与并行化数值模拟器进行了比较，该模拟器产生的结果与Anton的位相同，并且速度足够快，可以验证Anton计算的长期数值稳定性。模拟层次的相邻级别之间的这些显式耦合创建了从分子动力学到单个逻辑门的连续验证链。

{"title":"Hierarchical simulation-based verification of Anton, a special-purpose parallel machine","authors":"J. P. Grossman, J. Salmon, C. R. Ho, D. Ierardi, Brian Towles, Brannon Batson, Jochen Spengler, Stanley C. Wang, Rolf Mueller, Michael Theobald, C. Young, Joseph Gagliardo, Martin M. Deneroff, R. Dror, D. Shaw","doi":"10.1109/ICCD.2008.4751883","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751883","url":null,"abstract":"One of the major design verification challenges in the development of Anton, a massively parallel special-purpose machine for molecular dynamics, was to provide evidence that computations spanning more than a quadrillion clock cycles will produce valid scientific results. Our verification methodology addressed this problem by using a hierarchy of RTL, architectural, and numerical simulations. Block- and chip-level RTL models were verified by means of extensive co-simulation with a detailed C++ architectural simulator, ensuring that the RTL models could perform the same molecular dynamics computations as the architectural simulator. The output of the architectural simulator was compared to a parallelized numerical simulator that produces bitwise identical results to Anton, and is fast enough to verify the long-term numerical stability of computations on Anton. These explicit couplings between adjacent levels of the simulation hierarchy created a continuous verification chain from molecular dynamics to individual logic gates.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124183673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Global bus route optimization with application to microarchitectural design exploration 全球公交路线优化及其在微建筑设计中的应用探索

2008 IEEE International Conference on Computer Design

Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751931

Daehyun Kim, S. Lim

Circuit and processor designs will continue to increase in complexity for the foreseeable future. With these increasing sizes comes the use of wide buses to move large amounts of data from one place to another. Bus routing has therefore become increasingly important. In this paper, we present a new bus routing algorithm that globally optimizes both the floorplan and the bus routes themselves. Our algorithm is based on creating a range of feasible bus positions and then using Linear Programming to optimally solve for bus locations. We present this algorithm for use in microarchitectures and explore several different optimization objectives, including performance, floorplan area, and power consumption. Our results demonstrate that this algorithm is effective for efficiently generating feasible routes for complex modern designs and provides better results than previous approaches.

在可预见的未来，电路和处理器设计的复杂性将继续增加。随着这些尺寸的增加，使用宽总线将大量数据从一个地方移动到另一个地方。因此，公共汽车路线变得越来越重要。在本文中，我们提出了一种新的公交路线算法，该算法既全局优化平面图，也全局优化公交路线本身。我们的算法是基于创建一个可行的公交位置范围，然后使用线性规划来优化求解公交位置。我们提出了这种算法用于微架构，并探讨了几种不同的优化目标，包括性能、平面面积和功耗。结果表明，该算法能够有效地为复杂的现代设计生成可行路线，并提供了比以往方法更好的结果。

引用次数: 4

ZZ-HVS: Zig-zag horizontal and vertical sleep transistor sharing to reduce leakage power in on-chip SRAM peripheral circuits ZZ-HVS:锯齿形水平和垂直睡眠晶体管共享，以减少片上SRAM外围电路的泄漏功率

2008 IEEE International Conference on Computer Design

Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751937

H. Homayoun, Avesta Sasan, A. Veidenbaum

Based on Recent studies peripheral circuit (including decoders, wordline drivers, input and output drivers) constitutes a large portion of the cache leakage. In addition as technology migrate to smaller geometries, leakage contribution to total power consumption increases faster than dynamic power, promoting leakage as the largest power consumption factor. This paper proposes zig-zag share, a circuit technique to reduce leakage in SRAM peripheral. Using architectural control of zig-zag share, an integrated technique called Sleep-Share is proposed and applied in L1 and L2 caches. The results show leakage reduction by up to 40X in deeply pipelined SRAM peripheral circuits, with only a 4% area overhead and small additional delay.

根据最近的研究，外围电路(包括解码器、字行驱动、输入和输出驱动)构成了缓存泄漏的很大一部分。此外，随着技术向更小的几何形状迁移，泄漏对总功耗的贡献比动态功率增长得更快，从而使泄漏成为最大的功耗因素。本文提出了一种减少SRAM外设漏损的锯齿形共享电路技术。利用锯齿形共享的体系结构控制，提出了一种集成的睡眠共享技术，并将其应用于L1和L2缓存中。结果表明，在深度流水线SRAM外围电路中，泄漏减少高达40倍，只有4%的面积开销和很小的额外延迟。

引用次数: 20

Techniques for increasing effective data bandwidth 提高有效数据带宽的技术

2008 IEEE International Conference on Computer Design

Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751909

C. Nitta, M. Farrens

In this paper we examine techniques for increasing the effective bandwidth of the microprocessor off-chip interconnect. We focus on mechanisms that are orthogonal to other techniques currently being studied (3-D fabrication, optical interconnect, etc.) Using a range of full-system simulations we study the distribution of values being transferred to and from memory, and find that (as expected) high entropy data such as floating point numbers have limited compressibility, but that other data types offer more potential for compression. By using a simple heuristic to classify the contents of a cache line and providing different compression schemes for each classification, we show it is possible to provide overall compression at a cache line granularity comparable to that obtained by using a much more complex Lempel-Ziv-Welch algorithm.

在本文中，我们研究了增加微处理器片外互连有效带宽的技术。我们专注于与目前正在研究的其他技术(3d制造，光学互连等)正交的机制。使用一系列全系统模拟，我们研究了从存储器传输和传输的值的分布，并发现(如预期的)高熵数据(如浮点数)具有有限的可压缩性，但其他数据类型提供了更多的压缩潜力。通过使用简单的启发式方法对缓存线的内容进行分类，并为每种分类提供不同的压缩方案，我们展示了在缓存线粒度上提供与使用更复杂的Lempel-Ziv-Welch算法获得的压缩相当的整体压缩是可能的。

引用次数: 6

Reliability-aware Dynamic Voltage Scaling for energy-constrained real-time embedded systems 能量受限实时嵌入式系统的可靠性感知动态电压缩放

2008 IEEE International Conference on Computer Design

Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751927

Baoxian Zhao, Hakan Aydin, Dakai Zhu

The dynamic voltage scaling (DVS) technique is the basis of numerous state-of-the-art energy management schemes proposed for real-time embedded systems. However, recent research has illustrated the alarmingly negative impact of DVS on task and system reliability. In this paper, we consider the problem of processing frequency assignment to a set of real-time tasks in order to maximize the overall reliability, under given time and energy constraints. First, we formulate the problem as a non-linear optimization problem and show how to obtain the static optimal solution. Then, we propose on-line (dynamic) algorithms that detect early completions and adjust the task frequencies at run-time, to improve overall reliability. Our simulation results indicate that our algorithms perform comparably to a clairvoyant optimal scheduler that knows the exact workload in advance.

动态电压缩放(DVS)技术是为实时嵌入式系统提出的许多最新能源管理方案的基础。然而，最近的研究表明，分布式交换机对任务和系统可靠性的负面影响令人担忧。在给定的时间和能量约束下，为了使一组实时任务的总体可靠性最大化，我们考虑了处理频率分配问题。首先，我们将问题表述为一个非线性优化问题，并展示了如何获得静态最优解。然后，我们提出了在线(动态)算法来检测早期完成并在运行时调整任务频率，以提高整体可靠性。我们的仿真结果表明，我们的算法的性能与预先知道确切工作负载的千里眼最优调度程序相当。

引用次数: 41

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2008 IEEE International Conference on Computer Design

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀