首页 > 最新文献

2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)最新文献

英文 中文
Application-aware deadlock-free oblivious routing based on extended turn-model 基于扩展回合模型的应用感知无死锁遗忘路由
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105328
Ali Shafiee, M. Zolghadr, M. Arjomand, H. Sarbazi-Azad
Programmable hardware is gaining popularity as it can keep pace with growing performance demand in tight power budget, design and test cost, and serious reliability concerns of future multiprocessor embedded systems. Compatible with this trend, Network-on-Chip, as a potential bottleneck of future multi-cores, should also support pro-grammability. Here, we address this issue in design and implementation of routing algorithm for two-dimensional mesh. To this end, we allocate paths based on input traffic pattern and in parallel with customizing routing restriction for deadlock freedom. To achieve this, we propose extended turn model (ETM), a novel parametric deadlock-free routing for 2D meshes that generalize prior turn-based routing methods (e.g., odd-even) with great degree of freedoms. This model facilitates design of Mixed-Integer Linear Programming (MILP) approach, which considers channel dependency turns as independent variables and decides for both path allocation and routing restriction. We solve this problem by genetic algorithm and evaluate it using simulation experiments. Results reveal that application-aware ETM-based path allocation outperforms prior turn-based approaches under synthetic and real traffic loads.
可编程硬件越来越受欢迎,因为它可以跟上日益增长的性能需求,在紧张的功率预算,设计和测试成本,以及未来多处理器嵌入式系统的严重可靠性问题。与这一趋势相适应,片上网络作为未来多核的潜在瓶颈,也应该支持可编程性。本文在二维网格路由算法的设计与实现中解决了这一问题。为此,我们根据输入流量模式分配路径,并与自定义路由限制并行,以实现死锁自由。为了实现这一目标,我们提出了扩展回合模型(ETM),这是一种新的2D网格参数无死锁路由方法,它以极大的自由度推广了先前基于回合的路由方法(例如奇偶)。该模型为混合整数线性规划(MILP)方法的设计提供了便利,该方法将信道依赖匝数作为自变量来决定路径分配和路由限制。采用遗传算法求解该问题,并用仿真实验对其进行了评价。结果表明,在综合和真实交通负载下,基于应用感知etm的路径分配优于先前基于回合的路径分配方法。
{"title":"Application-aware deadlock-free oblivious routing based on extended turn-model","authors":"Ali Shafiee, M. Zolghadr, M. Arjomand, H. Sarbazi-Azad","doi":"10.1109/ICCAD.2011.6105328","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105328","url":null,"abstract":"Programmable hardware is gaining popularity as it can keep pace with growing performance demand in tight power budget, design and test cost, and serious reliability concerns of future multiprocessor embedded systems. Compatible with this trend, Network-on-Chip, as a potential bottleneck of future multi-cores, should also support pro-grammability. Here, we address this issue in design and implementation of routing algorithm for two-dimensional mesh. To this end, we allocate paths based on input traffic pattern and in parallel with customizing routing restriction for deadlock freedom. To achieve this, we propose extended turn model (ETM), a novel parametric deadlock-free routing for 2D meshes that generalize prior turn-based routing methods (e.g., odd-even) with great degree of freedoms. This model facilitates design of Mixed-Integer Linear Programming (MILP) approach, which considers channel dependency turns as independent variables and decides for both path allocation and routing restriction. We solve this problem by genetic algorithm and evaluate it using simulation experiments. Results reveal that application-aware ETM-based path allocation outperforms prior turn-based approaches under synthetic and real traffic loads.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73914237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A trace compression algorithm targeting power estimation of long benchmarks 一种针对长基准测试功率估计的跟踪压缩算法
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105406
A. Ayupov, S. Burns
This paper presents an algorithm for compressing long traces generated using RTL or other fast simulation. The compressed traces can be used by power analysis tools to estimate power on the original traces. We show that the length of the compressed trace is independent of the length of original trace and is a function of circuit size (precisely, its active part) for which the trace was generated. Our experiments show up to 578× compression ratio on several long RTL traces (up to 320,000 clock transitions) used for power analysis on three industrial blocks (4K, 114K and 202K gates). This leads to significant runtime improvement, especially when the traces are reused over multiple power analysis runs. The dynamic power estimated using compressed traces is within 5% of the power analysis on original traces.
本文提出了一种压缩由RTL或其他快速仿真产生的长迹的算法。功率分析工具可以使用压缩的走线来估计原始走线的功率。我们表明,压缩走线的长度与原始走线的长度无关,并且是为其生成走线的电路尺寸(确切地说,是其活动部分)的函数。我们的实验显示,用于三个工业模块(4K, 114K和202K门)的功率分析的几个长RTL走线(多达320,000个时钟转换)上的压缩比高达578x。这将导致显著的运行时改进,特别是当在多个电源分析运行中重用跟踪时。使用压缩走线估算的动态功率在原始走线功率分析的5%以内。
{"title":"A trace compression algorithm targeting power estimation of long benchmarks","authors":"A. Ayupov, S. Burns","doi":"10.1109/ICCAD.2011.6105406","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105406","url":null,"abstract":"This paper presents an algorithm for compressing long traces generated using RTL or other fast simulation. The compressed traces can be used by power analysis tools to estimate power on the original traces. We show that the length of the compressed trace is independent of the length of original trace and is a function of circuit size (precisely, its active part) for which the trace was generated. Our experiments show up to 578× compression ratio on several long RTL traces (up to 320,000 clock transitions) used for power analysis on three industrial blocks (4K, 114K and 202K gates). This leads to significant runtime improvement, especially when the traces are reused over multiple power analysis runs. The dynamic power estimated using compressed traces is within 5% of the power analysis on original traces.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82307436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient analytical macromodeling of large analog circuits by Transfer Function Trajectories 基于传递函数轨迹的大型模拟电路的高效解析宏观建模
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105311
Dimitri de Jonghe, G. Gielen
Automated abstraction of large analog circuits greatly improves simulation time in custom analog design flows. Due to the high degree of variety of circuits this task is mainly a manual ad-hoc approach. This paper proposes an automated modeling approach for large scale analog circuits that produces compact expressions from a SPICE netlist. The presented method builds upon the state-of-the-art Trajectory PieceWise (TPW) approach. Because of their data-driven nature, TPW implementations generate models that require on-the-fly database interpolation during simulation, which is not embedded in a standard commercial design flow. Our approach solves this by recombining TPW samples as a surface in a mixed state space-frequency domain, revealing information about the circuit's nonlinear behavior. The resulting data, termed Transfer Function Trajectories (TFT), is fitted with a parametric vector fitting algorithm and further translated to system blocks. These are compatible with VHDL-AMS/Verilog-AMS, Matlab/Simulink or hand calculations at all design stages. The models show high accuracy and a speedup of 10×–40× against the ELDO simulator for large circuits up to 150 nodes.
在定制模拟设计流程中,大型模拟电路的自动抽象大大提高了仿真时间。由于电路的高度多样化,这项任务主要是手工特设的方法。本文提出了一种大规模模拟电路的自动建模方法,该方法可以从SPICE网络表中生成紧凑的表达式。提出的方法建立在最先进的轨迹分段(TPW)方法的基础上。由于其数据驱动的性质,TPW实现生成的模型需要在仿真期间进行动态数据库插值,这并没有嵌入到标准的商业设计流程中。我们的方法通过将TPW样本重组为混合状态空频域的表面来解决这个问题,从而揭示有关电路非线性行为的信息。得到的数据称为传递函数轨迹(TFT),使用参数向量拟合算法进行拟合,并进一步转换为系统块。这些兼容VHDL-AMS/Verilog-AMS, Matlab/Simulink或手动计算在所有设计阶段。对于多达150个节点的大型电路,该模型显示出较高的准确性和相对ELDO模拟器的10×-40×加速。
{"title":"Efficient analytical macromodeling of large analog circuits by Transfer Function Trajectories","authors":"Dimitri de Jonghe, G. Gielen","doi":"10.1109/ICCAD.2011.6105311","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105311","url":null,"abstract":"Automated abstraction of large analog circuits greatly improves simulation time in custom analog design flows. Due to the high degree of variety of circuits this task is mainly a manual ad-hoc approach. This paper proposes an automated modeling approach for large scale analog circuits that produces compact expressions from a SPICE netlist. The presented method builds upon the state-of-the-art Trajectory PieceWise (TPW) approach. Because of their data-driven nature, TPW implementations generate models that require on-the-fly database interpolation during simulation, which is not embedded in a standard commercial design flow. Our approach solves this by recombining TPW samples as a surface in a mixed state space-frequency domain, revealing information about the circuit's nonlinear behavior. The resulting data, termed Transfer Function Trajectories (TFT), is fitted with a parametric vector fitting algorithm and further translated to system blocks. These are compatible with VHDL-AMS/Verilog-AMS, Matlab/Simulink or hand calculations at all design stages. The models show high accuracy and a speedup of 10×–40× against the ELDO simulator for large circuits up to 150 nodes.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80800535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Debugging with dominance: On-the-fly RTL debug solution implications 主导调试:实时RTL调试解决方案的含义
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105390
Hratch Mangassarian, A. Veneris, D. Smith, Sean Safarpour
Design debugging has become a resource-intensive bottleneck in modern VLSI CAD flows, consuming as much as 60% of the total verification effort. With typical design sizes exceeding the half-million synthesized gates mark, the growing number of blocks to be examined dramatically slows down the debugging process. The aim of this work is to prune the number of debugging iterations for finding all potential bugs, without affecting the debugging resolution. This is achieved by using structural dominance relationships between circuit components. More specifically, an iterative fixpoint algorithm is presented for finding dominance relationships between multiple-output blocks of the design. These relationships are then leveraged for the early discovery of potential bugs, along with their corrections, resulting in significant debugging speed-ups. Extensive experiments on real industrial designs show that 66% of solutions are discovered early due to dominator implications. This results in consistent performance gains in all cases and a 1.7× overall speed-up for finding all potential bugs, demonstrating the robustness and practicality of the proposed approach.
设计调试已经成为现代VLSI CAD流程中的资源密集型瓶颈,消耗了多达60%的总验证工作。在典型的设计尺寸超过50万个合成门的情况下,要检查的模块数量的增加会大大减慢调试过程。这项工作的目的是在不影响调试解决方案的情况下,减少查找所有潜在错误的调试迭代次数。这是通过使用电路元件之间的结构优势关系来实现的。更具体地说,提出了一种迭代不动点算法来寻找设计的多个输出块之间的优势关系。然后利用这些关系及早发现潜在的错误,并对其进行修正,从而显著提高调试速度。对实际工业设计的大量实验表明,66%的解决方案由于支配因素的影响而被早期发现。这导致在所有情况下都获得一致的性能提升,并且在发现所有潜在错误方面总体速度提高了1.7倍,证明了所建议方法的健壮性和实用性。
{"title":"Debugging with dominance: On-the-fly RTL debug solution implications","authors":"Hratch Mangassarian, A. Veneris, D. Smith, Sean Safarpour","doi":"10.1109/ICCAD.2011.6105390","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105390","url":null,"abstract":"Design debugging has become a resource-intensive bottleneck in modern VLSI CAD flows, consuming as much as 60% of the total verification effort. With typical design sizes exceeding the half-million synthesized gates mark, the growing number of blocks to be examined dramatically slows down the debugging process. The aim of this work is to prune the number of debugging iterations for finding all potential bugs, without affecting the debugging resolution. This is achieved by using structural dominance relationships between circuit components. More specifically, an iterative fixpoint algorithm is presented for finding dominance relationships between multiple-output blocks of the design. These relationships are then leveraged for the early discovery of potential bugs, along with their corrections, resulting in significant debugging speed-ups. Extensive experiments on real industrial designs show that 66% of solutions are discovered early due to dominator implications. This results in consistent performance gains in all cases and a 1.7× overall speed-up for finding all potential bugs, demonstrating the robustness and practicality of the proposed approach.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79758623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On the preconditioner of conjugate gradient method — A power grid simulation perspective 论共轭梯度法的前置条件——一个电网仿真的视角
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105374
Chung-Han Chou, Nien-Yu Tsai, Hao Yu, Che-Rung Lee, Yiyu Shi, Shih-Chieh Chang
Preconditioned Conjugate Gradient (PCG) method has been demonstrated to be effective in solving large-scale linear systems for sparse and symmetric positive definite matrices. One critical problem in PCG is to design a good preconditioner, which can significantly reduce the runtime while keeping memory usage efficient. Universal preconditioners are simple and easy to construct, but their effectiveness is highly problem-dependent. On the other hand, domain-specific preconditioners that explore the underlying physical meaning of the matrices usually work better, but are difficult to design. In this paper, we study the problem in the context of power grid simulation, and develop a novel preconditioner based on the power grid structure through simple circuit simulations. Experimental results show 43% reduction in the number of iterations and 23% speedup over existing universal preconditioners.
预条件共轭梯度法(PCG)已被证明是求解大规模线性系统的有效方法,该系统具有稀疏对称正定矩阵。在PCG中,一个关键问题是设计一个好的预调节器,它可以显著减少运行时间,同时保持内存使用效率。通用预调节器结构简单,易于构造,但其使用效果与实际问题高度相关。另一方面,探索矩阵的底层物理含义的领域特定前置条件通常工作得更好,但很难设计。本文在电网仿真的背景下对该问题进行了研究,并通过简单的电路仿真,开发了一种基于电网结构的新型预调节器。实验结果表明,与现有通用预调节器相比,迭代次数减少43%,速度提高23%。
{"title":"On the preconditioner of conjugate gradient method — A power grid simulation perspective","authors":"Chung-Han Chou, Nien-Yu Tsai, Hao Yu, Che-Rung Lee, Yiyu Shi, Shih-Chieh Chang","doi":"10.1109/ICCAD.2011.6105374","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105374","url":null,"abstract":"Preconditioned Conjugate Gradient (PCG) method has been demonstrated to be effective in solving large-scale linear systems for sparse and symmetric positive definite matrices. One critical problem in PCG is to design a good preconditioner, which can significantly reduce the runtime while keeping memory usage efficient. Universal preconditioners are simple and easy to construct, but their effectiveness is highly problem-dependent. On the other hand, domain-specific preconditioners that explore the underlying physical meaning of the matrices usually work better, but are difficult to design. In this paper, we study the problem in the context of power grid simulation, and develop a novel preconditioner based on the power grid structure through simple circuit simulations. Experimental results show 43% reduction in the number of iterations and 23% speedup over existing universal preconditioners.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84365802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Statistical defect-detection analysis of test sets using readily-available tester data 统计缺陷检测分析的测试集使用现成的测试数据
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105416
Xiaochun Yu, R. D. Blanton
At substantial cost, conventional methods for evaluating test quality apply a specially-generated test set to a large population of manufactured chips. In contrast, a new time-efficient framework for evaluating test quality (FETQ) that uses tester data from normal production has been developed and validated. FETQ estimates the quality of both static and adaptive test metrics, where the latter guides test using the results of statistical data analysis. FETQ is innovative since instead of evaluating a single measure of effectiveness (e.g., number of unique defects detected), it provides a confidence interval of effectiveness based on the analysis of a collection of test sets. FETQ is demonstrated by measuring the chip-detection capability of several static and adaptive test metrics using tester data from actual ICs.
传统的测试质量评估方法对大量生产的芯片应用专门生成的测试集,成本很高。相比之下,一个新的时间效率框架评估测试质量(FETQ),使用测试数据从正常生产已经开发和验证。FETQ评估静态和自适应测试度量的质量,后者使用统计数据分析的结果指导测试。FETQ是创新的,因为它不是评估单一的有效性度量(例如,检测到的唯一缺陷的数量),而是基于对测试集集合的分析提供有效性的置信区间。通过使用来自实际集成电路的测试数据测量几种静态和自适应测试指标的芯片检测能力,证明了FETQ。
{"title":"Statistical defect-detection analysis of test sets using readily-available tester data","authors":"Xiaochun Yu, R. D. Blanton","doi":"10.1109/ICCAD.2011.6105416","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105416","url":null,"abstract":"At substantial cost, conventional methods for evaluating test quality apply a specially-generated test set to a large population of manufactured chips. In contrast, a new time-efficient framework for evaluating test quality (FETQ) that uses tester data from normal production has been developed and validated. FETQ estimates the quality of both static and adaptive test metrics, where the latter guides test using the results of statistical data analysis. FETQ is innovative since instead of evaluating a single measure of effectiveness (e.g., number of unique defects detected), it provides a confidence interval of effectiveness based on the analysis of a collection of test sets. FETQ is demonstrated by measuring the chip-detection capability of several static and adaptive test metrics using tester data from actual ICs.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76664345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Timing ECO optimization via Bézier curve smoothing and fixability identification 基于bsamizier曲线平滑和可修复性辨识的定时ECO优化
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105412
Hua-Yu Chang, I. Jiang, Yao-Wen Chang
Due to the rapidly increasing design complexity in modern IC design, more and more timing failures are detected at late stages. Without deferring time-to-market, metal-only ECO is an economical technique to correct these late-found failures. Typically, a design undergoes many ECO runs in design houses; the usage of spare cells is of significant importance. Hence, in this paper, we aim at timing ECO using the least number of spare cells. We observe that a path with good timing is desired to be geometrically smooth. Different from negative slack and gate delay used in most of prior work, we propose a new metric of timing criticality — fixability — considering the smoothness of critical paths. To measure the smoothness of a path, we use Bézier curve as the golden path. Furthermore, in order to concurrently fix timing violations, we derive the dominance property to divide violated paths into independent segments. Based on Bézier curve smoothing, fixability identification, and the dominance property, we develop an efficient algorithm to fix violations. Compared with the state-of-the-art works, experimental results show that our algorithm not only effectively resolves all timing violations with few spare cells but also achieves 22.8X and 42.6X speedups.
在现代集成电路设计中,由于设计复杂性的迅速增加,越来越多的定时故障在后期被检测到。在不延迟上市时间的情况下,纯金属ECO是一种经济的技术,可以纠正这些后期发现的故障。通常情况下,一个设计会在设计公司进行多次ECO运行;备用电池的使用非常重要。因此,在本文中,我们的目标是使用最少数量的备用电池来定时ECO。我们观察到,具有良好时序的路径在几何上是光滑的。与以往工作中使用的负松弛和门延迟不同,我们提出了一种新的时间临界性度量-可修复性-考虑关键路径的平滑性。为了测量路径的平滑度,我们使用bsamizier曲线作为黄金路径。在此基础上,导出了将违规路径划分为独立段的优势性,从而实现了对违规路径的并行修复。基于bsamizier曲线平滑、可修复性识别和优势性,提出了一种有效的违规修复算法。实验结果表明,与现有算法相比,该算法不仅有效地解决了所有的时间违规问题,而且备用单元较少,速度提高了22.8倍和42.6倍。
{"title":"Timing ECO optimization via Bézier curve smoothing and fixability identification","authors":"Hua-Yu Chang, I. Jiang, Yao-Wen Chang","doi":"10.1109/ICCAD.2011.6105412","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105412","url":null,"abstract":"Due to the rapidly increasing design complexity in modern IC design, more and more timing failures are detected at late stages. Without deferring time-to-market, metal-only ECO is an economical technique to correct these late-found failures. Typically, a design undergoes many ECO runs in design houses; the usage of spare cells is of significant importance. Hence, in this paper, we aim at timing ECO using the least number of spare cells. We observe that a path with good timing is desired to be geometrically smooth. Different from negative slack and gate delay used in most of prior work, we propose a new metric of timing criticality — fixability — considering the smoothness of critical paths. To measure the smoothness of a path, we use Bézier curve as the golden path. Furthermore, in order to concurrently fix timing violations, we derive the dominance property to divide violated paths into independent segments. Based on Bézier curve smoothing, fixability identification, and the dominance property, we develop an efficient algorithm to fix violations. Compared with the state-of-the-art works, experimental results show that our algorithm not only effectively resolves all timing violations with few spare cells but also achieves 22.8X and 42.6X speedups.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85035740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cooperative parallelization 合作并行化
Pub Date : 2011-11-07 DOI: 10.5555/2132325.2132358
Praveen Yedlapalli, Emre Kultursay, M. Kandemir
We propose a cooperation between the programmer, the compiler and the runtime system to identify, exploit and efficiently exercise the parallelism available in many pointer based applications. Our parallelization strategy, called Cooperative Parallelization, is driven by programmer directives as well as runtime information. We show that minimal information from the programmer can be combined with runtime information to extract latent parallelism in many pointer intensive applications that involve trees and linked lists. We implemented a compilation framework which automatically parallelizes programs annotated with parallelism directives. We evaluated our approach on a collection of linked list and tree based applications. Our results show that we can achieve speedups of up to 15× on a sixteen-core platform. We also compared our approach to OpenMP both qualitatively and quantitatively.
我们建议在程序员、编译器和运行时系统之间进行合作,以识别、利用和有效地行使许多基于指针的应用程序中可用的并行性。我们的并行化策略称为协作并行化,它由程序员指令和运行时信息驱动。我们表明,在许多涉及树和链表的指针密集型应用程序中,来自程序员的最小信息可以与运行时信息相结合,以提取潜在的并行性。我们实现了一个编译框架,它可以自动并行化带有parallelism指令注释的程序。我们在一组基于链表和树的应用程序上评估了我们的方法。我们的结果表明,我们可以在16核平台上实现高达15倍的加速。我们还将我们的方法与OpenMP进行了定性和定量的比较。
{"title":"Cooperative parallelization","authors":"Praveen Yedlapalli, Emre Kultursay, M. Kandemir","doi":"10.5555/2132325.2132358","DOIUrl":"https://doi.org/10.5555/2132325.2132358","url":null,"abstract":"We propose a cooperation between the programmer, the compiler and the runtime system to identify, exploit and efficiently exercise the parallelism available in many pointer based applications. Our parallelization strategy, called Cooperative Parallelization, is driven by programmer directives as well as runtime information. We show that minimal information from the programmer can be combined with runtime information to extract latent parallelism in many pointer intensive applications that involve trees and linked lists. We implemented a compilation framework which automatically parallelizes programs annotated with parallelism directives. We evaluated our approach on a collection of linked list and tree based applications. Our results show that we can achieve speedups of up to 15× on a sixteen-core platform. We also compared our approach to OpenMP both qualitatively and quantitatively.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83546401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A framework for double patterning-enabled design 支持双重图案设计的框架
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105299
R. Ghaida, K. Agarwal, S. Nassif, Xin Yuan, L. Liebmann, Puneet Gupta
While the next generation of lithography systems is still under development, extending optical lithography using double patterning (DP) is the only solution to continue technology scaling. The biggest technical challenge of DP is the presence of mask-assignment conflicts in dense layers. In this paper, we propose a framework for DP conflict removal for standard cells. First, we offer an O(n) algorithm for mask assignment (up to 200× faster than the ILP-based approach) that guarantees a conflict-free solution if one exists. We then formulate the problem of conflict removal as a linear program (LP), which permits an extremely fast run-time (less than 10 seconds in real time for typical cells). The framework removes DP conflicts and legalizes the layout across all layers simultaneously while minimizing layout perturbation. For cells from a commercial 22nm library designed without any DP awareness, our method usually removes all DP conflicts without any area increase; for some complex cells, the method still removes all conflicts with a modest 6.7% average increase in area. The method is more general, however, and can also be applied for macro layouts and the interconnect layers in complete designs as we demonstrate in the paper.
虽然下一代光刻系统仍在开发中,但使用双图案(DP)扩展光学光刻是继续扩展技术规模的唯一解决方案。DP最大的技术挑战是在密集层中存在掩码分配冲突。在本文中,我们提出了一个标准单元的DP冲突去除框架。首先,我们提供了一种用于掩码分配的O(n)算法(比基于ilp的方法快200倍),如果存在的话,可以保证无冲突的解决方案。然后,我们将冲突消除问题表述为线性程序(LP),它允许极快的运行时间(典型单元的实时时间小于10秒)。该框架消除了DP冲突,同时使所有层的布局合法化,同时使布局扰动最小化。对于没有任何DP感知的商用22nm文库中的单元,我们的方法通常会消除所有DP冲突,而不会增加面积;对于一些复杂的单元格,该方法仍然消除了所有冲突,平均面积增加了6.7%。然而,该方法更为通用,也可以应用于宏布局和完整设计中的互连层,正如我们在本文中所演示的那样。
{"title":"A framework for double patterning-enabled design","authors":"R. Ghaida, K. Agarwal, S. Nassif, Xin Yuan, L. Liebmann, Puneet Gupta","doi":"10.1109/ICCAD.2011.6105299","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105299","url":null,"abstract":"While the next generation of lithography systems is still under development, extending optical lithography using double patterning (DP) is the only solution to continue technology scaling. The biggest technical challenge of DP is the presence of mask-assignment conflicts in dense layers. In this paper, we propose a framework for DP conflict removal for standard cells. First, we offer an O(n) algorithm for mask assignment (up to 200× faster than the ILP-based approach) that guarantees a conflict-free solution if one exists. We then formulate the problem of conflict removal as a linear program (LP), which permits an extremely fast run-time (less than 10 seconds in real time for typical cells). The framework removes DP conflicts and legalizes the layout across all layers simultaneously while minimizing layout perturbation. For cells from a commercial 22nm library designed without any DP awareness, our method usually removes all DP conflicts without any area increase; for some complex cells, the method still removes all conflicts with a modest 6.7% average increase in area. The method is more general, however, and can also be applied for macro layouts and the interconnect layers in complete designs as we demonstrate in the paper.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83743705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
PRICE: Power reduction by placement and clock-network co-synthesis for pulsed-latch designs 价格:通过放置和脉冲锁存器设计的时钟网络协同合成来降低功率
Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105310
Yi-Lin Chuang, Hong-Ting Lin, Tsung-Yi Ho, Yao-Wen Chang, Diana Marculescu
Pulsed latches have emerged as a popular technique to reduce the power consumption and delay for clock networks. However, the current physical synthesis flow for pulsed latches still performs circuit placement and clock-network synthesis separately, which limits achievable power reduction. This paper presents the first work in the literature to perform placement and clock-network co-synthesis for pulsed-latch designs. With the interplay between placement and clock-network synthesis, the clock-network power and timing can be optimized simultaneously. Novel progressive network forces are introduced to globally guide the placer for iterative improvements, while the clock-network synthesizer makes use of updated latch locations to optimize power and timing locally. Experimental results show that our framework can substantially minimize power consumption and improve timing slacks, compared to existing synthesis flows.
脉冲锁存器已成为一种流行的技术,以减少功耗和延迟的时钟网络。然而,目前脉冲锁存器的物理合成流程仍然分别进行电路放置和时钟网络合成,这限制了可实现的功耗降低。本文提出了在文献中执行脉冲锁存器设计的放置和时钟网络共合成的第一项工作。通过放置和时钟网络合成的相互作用,可以同时优化时钟网络功耗和时序。引入了新颖的渐进式网络力来全局指导砂矿机进行迭代改进,而时钟网络合成器则利用更新的锁存器位置来局部优化功率和定时。实验结果表明,与现有的合成流程相比,我们的框架可以大大降低功耗并改善时序松弛。
{"title":"PRICE: Power reduction by placement and clock-network co-synthesis for pulsed-latch designs","authors":"Yi-Lin Chuang, Hong-Ting Lin, Tsung-Yi Ho, Yao-Wen Chang, Diana Marculescu","doi":"10.1109/ICCAD.2011.6105310","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105310","url":null,"abstract":"Pulsed latches have emerged as a popular technique to reduce the power consumption and delay for clock networks. However, the current physical synthesis flow for pulsed latches still performs circuit placement and clock-network synthesis separately, which limits achievable power reduction. This paper presents the first work in the literature to perform placement and clock-network co-synthesis for pulsed-latch designs. With the interplay between placement and clock-network synthesis, the clock-network power and timing can be optimized simultaneously. Novel progressive network forces are introduced to globally guide the placer for iterative improvements, while the clock-network synthesizer makes use of updated latch locations to optimize power and timing locally. Experimental results show that our framework can substantially minimize power consumption and improve timing slacks, compared to existing synthesis flows.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89426424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1