2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)最新文献

英文中文

A scalable quantitative measure of IR-drop effects for scan pattern generation 一种可扩展的用于扫描模式生成的红外下降效应的定量测量

2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2010-11-07 DOI: 10.1109/ICCAD.2010.5654130

Meng-Fan Wu, Kun-Han Tsai, Wu-Tung Cheng, Hsin-Cheih Pan, Jiun-Lang Huang, A. Kifli

Analysis of power grid IR-drop during scan test application has drawn growing attention because excessive IR-drop may cause a functionally correct device to fail at-speed testing. The analysis is challenging since the power grid IR-drop profile depends on not only the switching cells locations but also the power grid structure. This paper presents a scalable implementation methodology for quantifying the IR-drop effects of a set of switching cells. An example of its application to guide power-safe scan pattern generation is illustrated. The scalability and effectiveness of the proposed quantitative measure is evaluated with a 130 nm industrial design with 800 K cells.

扫描测试过程中电网红外降的分析越来越受到人们的关注，因为过大的红外降可能导致功能正常的设备在高速测试中失效。由于电网ir降曲线不仅取决于开关单元的位置，还取决于电网结构，因此分析具有挑战性。本文提出了一种可扩展的实现方法，用于量化一组开关单元的ir下降效应。并举例说明了该方法在指导功率安全扫描模式生成中的应用。该定量测量方法的可扩展性和有效性通过采用800k电池的130 nm工业设计进行了评估。

引用次数: 5

Analysis of precision for scaling the intermediate variables in fixed-point arithmetic circuits 定点算术电路中中间变量缩放精度分析

2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2010-11-07 DOI: 10.5555/2133429.2133586

O. Sarbishei, K. Radecka

This paper presents a new technique for scaling the intermediate variables in implementing fixed-point polynomial-based arithmetic circuits. Analysis of precision has been used first to set the input and coefficient bit-widths of the polynomial so that a given error bound is satisfied. Then, we present an efficient approach to scale and truncate different intermediate variables with no need of re-computing precision at each stage. After applying it to all the intermediate variables, a final precision computation and sensitivity analysis is performed to set the final values of truncation bits so that the given error bound remains satisfied. Experimental results on a set of polynomial benchmarks show the robustness and efficiency of the proposed technique.

本文提出了一种实现不动点多项式算术电路的中间变量缩放的新技术。精度分析首先用于设置多项式的输入和系数位宽，以满足给定的误差范围。然后，我们提出了一种有效的方法来缩放和截断不同的中间变量，而不需要在每个阶段重新计算精度。将其应用于所有中间变量后，进行最终的精度计算和灵敏度分析，以确定截断位的最终值，使给定的误差界保持满足。在一组多项式基准上的实验结果表明了该方法的鲁棒性和有效性。

引用次数: 15

SETS: Stochastic execution time scheduling for multicore systems by joint state space and Monte Carlo 基于联合状态空间和蒙特卡罗的多核系统随机执行时间调度

2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2010-11-07 DOI: 10.1109/ICCAD.2010.5654114

Nabeel Iqbal, J. Henkel

The advent of multicore platforms has renewed the interest in scheduling techniques for real-time systems. Historically, ‘scheduling decisions’ are implemented considering fixed task execution times, as for the case of Worst Case Execution Time (WCET). The limitations of scheduling considering WCET manifest in terms of under-utilization of resources for large application classes. In the realm of multicore systems, the notion of WCET is hardly meaningful due to the large set of factors influencing it. Within soft real-time systems, a more realistic modeling approach would be to consider tasks featuring varying execution times (i.e. stochastic). This paper addresses the problem of stochastic task execution time scheduling that is agnostic to statistical properties of the execution time. Our proposed method is orthogonal to any number of linear acyclic task graphs and their underlying architecture. The joint estimation of execution time and the associated parameters, relying on the interdependence of parallel tasks, help build a ‘nonlinear Non-Gaussian state space’ model. To obtain nearly Bayesian estimates, irrespective of the execution time characteristics, a recursive solution of the state space model is found by means of the Monte Carlo method. The recursive solution reduces the computational and memory overhead and adapts statistical properties of execution times at run time. Finally, the variable laxity EDF scheduler schedules the tasks considering the predicted execution times. We show that variable execution time scheduling improves the utilization of resources and ensures the quality of service. Our proposed new solution does not require any a priori knowledge of any kind and eliminates the fundamental constraints associated with the estimation of execution times. Results clearly show the advantage of the proposed method as it achieves 76% better task utilization, 68% more task scheduling and deadline miss reduction by 53% compared to current state-of-the-art methods.

多核平台的出现重新引起了人们对实时系统调度技术的兴趣。从历史上看，“调度决策”是考虑固定的任务执行时间来实现的，比如最坏情况执行时间(WCET)。考虑WCET的调度限制体现在大型应用程序类的资源利用率不足。在多核系统领域，由于影响它的因素很多，WCET的概念几乎没有意义。在软实时系统中，更现实的建模方法是考虑具有不同执行时间(即随机)的任务。研究了不考虑执行时间统计特性的随机任务执行时间调度问题。我们提出的方法与任意数量的线性无环任务图及其底层结构是正交的。基于并行任务的相互依赖性，对执行时间和相关参数的联合估计有助于建立“非线性非高斯状态空间”模型。为了在不考虑执行时间特征的情况下获得近似贝叶斯估计，利用蒙特卡罗方法找到了状态空间模型的递归解。递归解决方案减少了计算和内存开销，并在运行时调整了执行时间的统计属性。最后，可变松弛EDF调度器根据预测的执行时间调度任务。研究表明，可变执行时间调度提高了资源利用率，保证了服务质量。我们提出的新解决方案不需要任何类型的先验知识，并且消除了与估计执行时间相关的基本约束。结果清楚地显示了所提出的方法的优势，因为与当前最先进的方法相比，它实现了76%的任务利用率，68%的任务调度和53%的截止日期错过减少。

{"title":"SETS: Stochastic execution time scheduling for multicore systems by joint state space and Monte Carlo","authors":"Nabeel Iqbal, J. Henkel","doi":"10.1109/ICCAD.2010.5654114","DOIUrl":"https://doi.org/10.1109/ICCAD.2010.5654114","url":null,"abstract":"The advent of multicore platforms has renewed the interest in scheduling techniques for real-time systems. Historically, ‘scheduling decisions’ are implemented considering fixed task execution times, as for the case of Worst Case Execution Time (WCET). The limitations of scheduling considering WCET manifest in terms of under-utilization of resources for large application classes. In the realm of multicore systems, the notion of WCET is hardly meaningful due to the large set of factors influencing it. Within soft real-time systems, a more realistic modeling approach would be to consider tasks featuring varying execution times (i.e. stochastic). This paper addresses the problem of stochastic task execution time scheduling that is agnostic to statistical properties of the execution time. Our proposed method is orthogonal to any number of linear acyclic task graphs and their underlying architecture. The joint estimation of execution time and the associated parameters, relying on the interdependence of parallel tasks, help build a ‘nonlinear Non-Gaussian state space’ model. To obtain nearly Bayesian estimates, irrespective of the execution time characteristics, a recursive solution of the state space model is found by means of the Monte Carlo method. The recursive solution reduces the computational and memory overhead and adapts statistical properties of execution times at run time. Finally, the variable laxity EDF scheduler schedules the tasks considering the predicted execution times. We show that variable execution time scheduling improves the utilization of resources and ensures the quality of service. Our proposed new solution does not require any a priori knowledge of any kind and eliminates the fundamental constraints associated with the estimation of execution times. Results clearly show the advantage of the proposed method as it achieves 76% better task utilization, 68% more task scheduling and deadline miss reduction by 53% compared to current state-of-the-art methods.","PeriodicalId":344703,"journal":{"name":"2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"24 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120846502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Bi-decomposition of large Boolean functions using blocking edge graphs 用块边图对大布尔函数进行双分解

2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2010-11-07 DOI: 10.1109/ICCAD.2010.5654210

M. Choudhury, K. Mohanram

Bi-decomposition techniques have been known to significantly reduce area, delay, and power during logic synthesis since they can explore multi-level and, or, and xor decompositions in a scalable technology-independent manner. The complexity of bi-decompo-sition techniques is in achieving a good variable partition for the given logic function. State-of-the-art techniques use heuristics and/or brute-force enumeration for variable partitioning, which results in sub-optimal results and/or poor scalability with function complexity. This paper describes a fast, scalable algorithm for obtaining provably optimum variable partitions for bi-decomposition of Boolean functions by constructing an undirected graph called the blocking edge graph (BEG). To the best of our knowledge, this is the first algorithm that demonstrates a systematic approach to derive disjoint and overlapping variable partitions for bi-decomposition. Since a BEG has only one vertex per input, our technique scales to Boolean functions with hundreds of inputs. Results indicate that on average, BEG-based bi-decomposition reduces the number of logic levels (mapped delay) of 16 benchmark circuits by 60%, 34%, 45%, and 30% (20%, 19%, 16% and 20%) over the best results of state-of-the-art tools FBDD, SIS, ABC, and an industry-standard synthesizer, respectively.

众所周知，双分解技术可以显著减少逻辑合成期间的面积、延迟和功耗，因为它们可以以一种可扩展的、与技术无关的方式探索多级和、或和或分解。双分解技术的复杂性在于如何对给定的逻辑函数进行良好的变量划分。最先进的技术使用启发式和/或暴力枚举进行变量划分，这会导致次优结果和/或函数复杂性较差的可伸缩性。本文通过构造一个无向图，即块边图(BEG)，描述了一种快速、可扩展的布尔函数双分解的可证明最优变量分区算法。据我们所知，这是第一个展示了一种系统方法来导出不相交和重叠变量分区的算法。由于BEG每个输入只有一个顶点，因此我们的技术可以扩展到具有数百个输入的布尔函数。结果表明，平均而言，基于begg的双分解将16个基准电路的逻辑电平(映射延迟)数量分别减少了60%，34%，45%和30%(20%，19%，16%和20%)，比最先进的工具FBDD, SIS, ABC和行业标准合成器的最佳结果分别减少了60%，34%，45%和30%。

{"title":"Bi-decomposition of large Boolean functions using blocking edge graphs","authors":"M. Choudhury, K. Mohanram","doi":"10.1109/ICCAD.2010.5654210","DOIUrl":"https://doi.org/10.1109/ICCAD.2010.5654210","url":null,"abstract":"Bi-decomposition techniques have been known to significantly reduce area, delay, and power during logic synthesis since they can explore multi-level and, or, and xor decompositions in a scalable technology-independent manner. The complexity of bi-decompo-sition techniques is in achieving a good variable partition for the given logic function. State-of-the-art techniques use heuristics and/or brute-force enumeration for variable partitioning, which results in sub-optimal results and/or poor scalability with function complexity. This paper describes a fast, scalable algorithm for obtaining provably optimum variable partitions for bi-decomposition of Boolean functions by constructing an undirected graph called the blocking edge graph (BEG). To the best of our knowledge, this is the first algorithm that demonstrates a systematic approach to derive disjoint and overlapping variable partitions for bi-decomposition. Since a BEG has only one vertex per input, our technique scales to Boolean functions with hundreds of inputs. Results indicate that on average, BEG-based bi-decomposition reduces the number of logic levels (mapped delay) of 16 benchmark circuits by 60%, 34%, 45%, and 30% (20%, 19%, 16% and 20%) over the best results of state-of-the-art tools FBDD, SIS, ABC, and an industry-standard synthesizer, respectively.","PeriodicalId":344703,"journal":{"name":"2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124989503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Fast statistical timing analysis of latch-controlled circuits for arbitrary clock periods 任意时钟周期锁存控制电路的快速统计时序分析

2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2010-11-07 DOI: 10.1109/ICCAD.2010.5653800

Bing Li, Ning Chen, Ulf Schlichtmann

Latch-controlled circuits have a remarkable advantage in timing performance as process variations become more relevant for circuit design. Existing methods of statistical timing analysis for such circuits, however, still need improvement in runtime and their results should be extended to provide yield information for any given clock period. In this paper, we propose a method combining a simplified iteration and a graph transformation algorithm. The result of this method is in a parametric form so that the yield for any given clock period can easily be evaluated. The graph transformation algorithm handles the constraints from nonpositive loops effectively, completely avoiding the heuristics used in other existing methods. Therefore the accuracy of the timing analysis is well maintained. Additionally, the proposed method is much faster than other existing methods. Especially for large circuits it offers about 100 times performance improvement in timing verification.

锁存控制电路在时序性能方面具有显著的优势，因为过程变化与电路设计越来越相关。然而，这种电路的现有统计时序分析方法在运行时间方面仍需改进，其结果应加以扩展，以提供任何给定时钟周期的产率信息。本文提出了一种结合简化迭代和图变换算法的方法。这种方法的结果是参数形式的，因此可以很容易地评估任何给定时钟周期的产量。图变换算法有效地处理了非正循环的约束，完全避免了其他现有方法中使用的启发式方法。因此，时序分析的准确性得到了很好的保证。此外，该方法的速度比现有方法快得多。特别是对于大型电路，它提供了约100倍的性能改进，在时间验证。

引用次数: 7

Peak current reduction by simultaneous state replication and re-encoding 通过同时状态复制和重新编码减少峰值电流

2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2010-11-07 DOI: 10.1109/ICCAD.2010.5654204

J. Gu, G. Qu, Lin Yuan, Qiang Zhou

Peak current is one of the important considerations for circuit design and testing in the deep sub-micron technology. In a synchronous finite state machine (FSM), it is observed that the peak current happens at the moment of state transitions and it has a strong correlation with the maximum number of state registers switching in the same direction simultaneously [2], which we refer to as the peak switching value (PSV). We propose a FSM synthesis method to reduce P SV by seamlessly combining state replication and state re-encoding techniques. Our experiments show that out of 52 FSM benchmarks encoded by a state-of-the-art power-driven encoding algorithm POW3 [1], 36 of them are not optimal in terms of PSV. Our approach can improve on 34 of them with an average 39.2% PSV reduction, while the only comparable PSV-driven FSM synthesis technique [2] can improve on 27 benchmarks with an average 24.5% reduction. Furthermore, we compare our approach with [2] after the FSMs are implemented using an industry EDA tool. The results show that our approach reduces the peak current in the circuits by 13% on average and the total power by 3% with a mere 2% overhead in area.

在深亚微米技术中，峰值电流是电路设计和测试的重要考虑因素之一。在同步有限状态机(FSM)中，我们观察到峰值电流出现在状态转换时刻，它与同时在同一方向切换的状态寄存器的最大数量有很强的相关性[2]，我们称之为峰值切换值(PSV)。我们提出了一种FSM综合方法，通过无缝结合状态复制和状态重编码技术来降低psv。我们的实验表明，在由最先进的功率驱动编码算法POW3[1]编码的52个FSM基准测试中，有36个在PSV方面不是最优的。我们的方法可以提高其中34个基准，平均降低39.2%的PSV，而唯一可比较的PSV驱动的FSM合成技术[2]可以提高27个基准，平均降低24.5%。此外，在使用工业EDA工具实现fsm后，我们将我们的方法与[2]进行了比较。结果表明，我们的方法使电路的峰值电流平均降低了13%，总功率降低了3%，而面积开销仅为2%。

{"title":"Peak current reduction by simultaneous state replication and re-encoding","authors":"J. Gu, G. Qu, Lin Yuan, Qiang Zhou","doi":"10.1109/ICCAD.2010.5654204","DOIUrl":"https://doi.org/10.1109/ICCAD.2010.5654204","url":null,"abstract":"Peak current is one of the important considerations for circuit design and testing in the deep sub-micron technology. In a synchronous finite state machine (FSM), it is observed that the peak current happens at the moment of state transitions and it has a strong correlation with the maximum number of state registers switching in the same direction simultaneously [2], which we refer to as the peak switching value (PSV). We propose a FSM synthesis method to reduce P SV by seamlessly combining state replication and state re-encoding techniques. Our experiments show that out of 52 FSM benchmarks encoded by a state-of-the-art power-driven encoding algorithm POW3 [1], 36 of them are not optimal in terms of PSV. Our approach can improve on 34 of them with an average 39.2% PSV reduction, while the only comparable PSV-driven FSM synthesis technique [2] can improve on 27 benchmarks with an average 24.5% reduction. Furthermore, we compare our approach with [2] after the FSMs are implemented using an industry EDA tool. The results show that our approach reduces the peak current in the circuits by 13% on average and the total power by 3% with a mere 2% overhead in area.","PeriodicalId":344703,"journal":{"name":"2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123009797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Yield enhancement for 3D-stacked memory by redundancy sharing across dies 通过跨芯片冗余共享来提高3d堆叠存储器的成品率

2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2010-11-07 DOI: 10.1109/ICCAD.2010.5654160

Li Jiang, Rong Ye, Q. Xu

Three-dimensional (3D) memory products are emerging to fulfill the ever-increasing demands of storage capacity. In 3D-stacked memory, redundancy sharing between neighboring vertical memory blocks using short through-silicon vias (TSVs) is a promising solution for yield enhancement. Since different memory dies are with distinct fault bitmaps, how to selectively matching them together to maximize the yield for the bonded 3D-stacked memory is an interesting and relevant problem. In this paper, we present novel solutions to tackle the above problem. Experimental results show that the proposed methodology can significantly increase memory yield when compared to the case that we only bond self-reparable dies together.

三维(3D)存储产品不断涌现，以满足日益增长的存储容量需求。在3d堆叠存储器中，利用短通硅通孔(tsv)在相邻的垂直存储器块之间共享冗余是一种很有前途的提高良率的解决方案。由于不同的存储芯片具有不同的故障位图，如何有选择地将它们匹配在一起以最大化键合3d堆叠存储器的成品率是一个有趣而相关的问题。在本文中，我们提出了解决上述问题的新方法。实验结果表明，与仅将可自我修复的芯片连接在一起的情况相比，该方法可以显著提高内存产出率。

引用次数: 58

Synthesis of an efficient controlling structure for post-silicon clock skew minimization 后硅时钟偏差最小化有效控制结构的合成

2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2010-11-07 DOI: 10.1109/ICCAD.2010.5654274

Yu-Chien Kao, Hsuan-Ming Chou, Kun-Ting Tsai, Shih-Chieh Chang

Clock skew minimization has been an important design constraint. However, due to the complexity of Process, Voltage, and Temperature (PVT) variations, the minimization of clock skew has faced a great challenge. To overcome the influence of PVT variations, several previous works proposed Post Silicon Tuning (PST) architecture to dynamically balance the skew of a clock tree. In the PST architecture, there are two main components: Adjustable Delay Buffer (ADB) and Phase Detector (PD). Most previous works focus on determining good positions of ADBs in a PST design. In this paper, we first show that which pairs of FFs are connected to PDs, called PD structure, also greatly influence the complexity of hardware control for a PST design. Without careful planning of a PD structure, we need large number of control signals to adjust the delays of ADBs. In addition, we also show that a PD structure may influence the accuracy of the clock skew. Among possible connection structures, this paper proposes an efficient PD structure which not only simplifies the hardware control but also minimizes the clock skew of a PST design.

时钟偏差最小化一直是一个重要的设计约束。然而，由于工艺、电压和温度(PVT)变化的复杂性，时钟偏差的最小化面临着巨大的挑战。为了克服PVT变化的影响，之前的一些研究提出了后硅调谐(PST)架构来动态平衡时钟树的倾斜。在PST架构中，有两个主要组件:可调延迟缓冲器(ADB)和相位检测器(PD)。大多数先前的工作都集中在确定PST设计中adb的最佳位置。在本文中，我们首先展示了哪些对ff连接到PST，称为PD结构，也极大地影响了PST设计的硬件控制复杂性。如果不仔细规划PD结构，我们需要大量的控制信号来调整adb的延迟。此外，我们还证明了PD结构可能会影响时钟偏差的精度。在可能的连接结构中，本文提出了一种高效的PD结构，它不仅简化了硬件控制，而且使PST设计的时钟偏差最小化。

{"title":"Synthesis of an efficient controlling structure for post-silicon clock skew minimization","authors":"Yu-Chien Kao, Hsuan-Ming Chou, Kun-Ting Tsai, Shih-Chieh Chang","doi":"10.1109/ICCAD.2010.5654274","DOIUrl":"https://doi.org/10.1109/ICCAD.2010.5654274","url":null,"abstract":"Clock skew minimization has been an important design constraint. However, due to the complexity of Process, Voltage, and Temperature (PVT) variations, the minimization of clock skew has faced a great challenge. To overcome the influence of PVT variations, several previous works proposed Post Silicon Tuning (PST) architecture to dynamically balance the skew of a clock tree. In the PST architecture, there are two main components: Adjustable Delay Buffer (ADB) and Phase Detector (PD). Most previous works focus on determining good positions of ADBs in a PST design. In this paper, we first show that which pairs of FFs are connected to PDs, called PD structure, also greatly influence the complexity of hardware control for a PST design. Without careful planning of a PD structure, we need large number of control signals to adjust the delays of ADBs. In addition, we also show that a PD structure may influence the accuracy of the clock skew. Among possible connection structures, this paper proposes an efficient PD structure which not only simplifies the hardware control but also minimizes the clock skew of a PST design.","PeriodicalId":344703,"journal":{"name":"2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128200407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Stress-driven 3D-IC placement with TSV keep-out zone and regularity study 应力驱动3D-IC放置与TSV防漏区及规律研究

2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2010-11-07 DOI: 10.1109/ICCAD.2010.5654245

K. Athikulwongse, A. Chakraborty, Jae-Seok Yang, D. Pan, S. Lim

Through-silicon via (TSV) fabrication causes tensile stress around TSVs which results in significant carrier mobility variation in the devices in their neighborhood. Keep-out zone (KOZ) is a conservative way to prevent any devices/cells from being impacted by the TSV-induced stress. However, owing to already large TSV size, large KOZ can significantly reduce the placement area available for cells, thus requiring larger dies which negate improvement in wirelength and timing due to 3D integration. In this paper, we study the impact of KOZ dimension on stress, carrier mobility variation, area, wirelength, and performance of 3D ICs. We demonstrate that, instead of requiring large KOZ, 3D-IC placers must exploit TSV stress-induced carrier mobility variation to improve the timing and area objectives during placement. We propose a new TSV stress-driven force-directed 3D placement that consistently provides placement result with, on average, 21.6% better worst negative slack (WNS) and 28.0% better total negative slack (TNS) than wirelength-driven placement.

通过硅通孔(TSV)制造引起TSV周围的拉伸应力，导致其附近器件的载流子迁移率发生显着变化。隔离区(KOZ)是一种保守的方法，可以防止任何设备/细胞受到tsv引起的压力的影响。然而，由于TSV尺寸已经很大，较大的KOZ会大大减少可用于单元的放置面积，因此需要更大的模具，从而抵消了由于3D集成而带来的无线和定时改善。在本文中，我们研究了KOZ尺寸对三维集成电路的应力、载流子迁移率变化、面积、波长和性能的影响。我们证明，3D-IC砂矿不需要大的KOZ，而是必须利用TSV应力诱导的载流子迁移率变化来改善放置过程中的时间和面积目标。我们提出了一种新的TSV应力驱动的力定向3D放置方法，与无线驱动放置相比，该方法的放置效果平均提高了21.6%的最差负松弛(WNS)和28.0%的总负松弛(TNS)。

引用次数: 96

Template-mask design methodology for double patterning technology 双重图案技术的模板-掩模设计方法

2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2010-11-07 DOI: 10.1109/ICCAD.2010.5654288

Chin-Hsiung Hsu, Yao-Wen Chang, S. Nassif

Double patterning technology (DPT) has recently gained much attention and is viewed as the most promising solution for the sub-32-nm node process. DPT decomposes a layout into two masks and applies double exposure patterning to increase the pitch size and thus printability. This paper proposes the first mask-sharing methodology for DPT, which can share masks among different designs, to reduce the number of costly masks for double patterning. The design methodology consists of two tasks: template-mask design and template-mask-aware routing. A graph matching-based algorithm is developed to design a flexible template mask that tries to accommodate as many design patterns as possible. We also present a template-mask-aware routing (TMR) algorithm, focusing on DPT-related issues to generate routing solutions that satisfy the constraints induced from double patterning and template masks. Experimental results show that our designed template mask is mask-saving, and our TMR can achieve conflict-free routing with 100% routability and save at least two masks for each circuit with reasonable wirelength and runtime overheads.

双图案技术(DPT)近年来备受关注，被认为是在32nm以下节点制程中最有前途的解决方案。DPT将布局分解为两个蒙版，并应用双重曝光模式来增加间距大小，从而提高可印刷性。本文提出了DPT的第一个掩模共享方法，该方法可以在不同的设计之间共享掩模，以减少重复图案的昂贵掩模数量。设计方法包括两个任务:模板掩码设计和模板掩码感知路由。开发了一种基于图匹配的算法来设计一个灵活的模板掩模，该掩模试图容纳尽可能多的设计模式。我们还提出了一个模板-掩码感知路由(TMR)算法，重点关注dpt相关问题，以生成满足双重模式和模板掩码约束的路由解决方案。实验结果表明，我们设计的模板掩码是节省掩码的，我们的TMR可以实现100%可达性的无冲突路由，并且在合理的无线长度和运行时间开销下，每个电路至少节省两个掩码。

{"title":"Template-mask design methodology for double patterning technology","authors":"Chin-Hsiung Hsu, Yao-Wen Chang, S. Nassif","doi":"10.1109/ICCAD.2010.5654288","DOIUrl":"https://doi.org/10.1109/ICCAD.2010.5654288","url":null,"abstract":"Double patterning technology (DPT) has recently gained much attention and is viewed as the most promising solution for the sub-32-nm node process. DPT decomposes a layout into two masks and applies double exposure patterning to increase the pitch size and thus printability. This paper proposes the first mask-sharing methodology for DPT, which can share masks among different designs, to reduce the number of costly masks for double patterning. The design methodology consists of two tasks: template-mask design and template-mask-aware routing. A graph matching-based algorithm is developed to design a flexible template mask that tries to accommodate as many design patterns as possible. We also present a template-mask-aware routing (TMR) algorithm, focusing on DPT-related issues to generate routing solutions that satisfy the constraints induced from double patterning and template masks. Experimental results show that our designed template mask is mask-saving, and our TMR can achieve conflict-free routing with 100% routability and save at least two masks for each circuit with reasonable wirelength and runtime overheads.","PeriodicalId":344703,"journal":{"name":"2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116253524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀