2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)最新文献

英文中文

Modeling and estimation of power supply noise using linear programming 基于线性规划的电源噪声建模与估计

2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105382

F. Firouzi, S. Kiamehr, M. Tahoori

Power supply noise in nano-scale VLSI is one of the design concerns. Due to switching current of various logic gates, the actual supply voltage seen by different devices fluctuates, causing extra delays and ultimately intermittent faults during operation. Therefore, accurate estimation of worst case scenario, maximum noise and the vectors causing it, is extremely important for design, verification, and manufacturing test steps. In this paper we present a mixed-integer linear programming modeling of power supply noise in digital circuits to obtain fast and accurate solutions. Compared with accurate SPICE simulations of random vectors for a set of benchmark circuits, the proposed approach can achieve 13115× speedup while obtains 2.7% more optimization in average.

在纳米级超大规模集成电路中，电源噪声是设计关注的问题之一。由于各种逻辑门的开关电流，不同器件看到的实际电源电压会出现波动，从而在运行过程中产生额外的延迟，最终导致间歇性故障。因此，准确估计最坏情况、最大噪声和引起噪声的矢量，对于设计、验证和制造测试步骤是极其重要的。本文提出了一种数字电路电源噪声的混合整数线性规划建模方法，以获得快速准确的求解结果。与一组基准电路的随机向量精确SPICE模拟相比，该方法的加速速度提高了13115倍，平均优化率提高了2.7%。

引用次数: 9

Synchronous elasticization at a reduced cost: Utilizing the ultra simple fork and controller merging 降低成本的同步弹性化:利用超简单的分叉和控制器合并

2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105420

E. Kilada, K. Stevens

Latency insensitive (LI) designs can tolerate arbitrary computation and communication latencies. Synchronous elasticization converts an ordinary clocked design into LI. It uses communication protocols such as the Synchronous Elastic Flow (SELF). Comparing to its lazy implementations, eager SELF has no combinational cycles and can provide performance advantage. Yet, it uses eager forks (EForks) consuming more area and power. This paper demonstrates that EForks can be redundant. A novel ultra simple fork (USFork) implementation is introduced. The conditions under which an EFork will behave exactly the same as a USFork (from the protocol perspective) are formally derived. The paper also investigates the conditions under which multiple SELF controllers can be merged to further decrease the area and power overhead (as long as the physical placement allows). The flow has been integrated in a fully automated tool, HGEN. Hybrid GENerator (HGEN) selectively replaces redundant EForks with USForks and, optionally, merges equivalent controllers. HGEN uses 6thSense tool as an embedded verification engine. Comparing to the methodology used in published work on a MiniMIPS processor case study, HGEN shows up to 34.3% and 25.4% savings in area and power due to utilizing USForks. It also shows at least 32% saving in the number of EForks in s382 ISCAS benchmark. More reduction is possible if the physical placement allows for controller merging. Thanks to the advance in synchronous verification technology, HGEN runs within a few minutes (for all this paper examples). This makes the proposed approach suitable for tight time-to-market constraints.

延迟不敏感(LI)设计可以容忍任意的计算和通信延迟。同步弹性将普通时钟设计转换为LI。它使用诸如同步弹性流(SELF)之类的通信协议。与其lazy实现相比，eager SELF没有组合周期，可以提供性能优势。然而，它使用急切分叉(EForks)，消耗更多的面积和功率。本文证明了efork可以是冗余的。介绍了一种新颖的超简单分叉(USFork)实现。EFork的行为与USFork完全相同的条件(从协议的角度来看)是正式派生的。本文还研究了合并多个SELF控制器以进一步减少面积和功率开销的条件(只要物理位置允许)。该流程已集成到全自动工具HGEN中。混合发生器(HGEN)有选择地用USForks替换冗余的efork，并有选择地合并等效控制器。HGEN使用6thSense工具作为嵌入式验证引擎。与已发表的MiniMIPS处理器案例研究中使用的方法相比，HGEN显示，由于使用USForks，在面积和功耗方面节省了34.3%和25.4%。它还显示，在s382 ISCAS基准测试中，efork的数量至少节省了32%。如果物理位置允许控制器合并，则可以减少更多。由于同步验证技术的进步，HGEN可以在几分钟内运行(对于本文中的所有示例)。这使得所建议的方法适合于时间紧迫的上市限制。

{"title":"Synchronous elasticization at a reduced cost: Utilizing the ultra simple fork and controller merging","authors":"E. Kilada, K. Stevens","doi":"10.1109/ICCAD.2011.6105420","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105420","url":null,"abstract":"Latency insensitive (LI) designs can tolerate arbitrary computation and communication latencies. Synchronous elasticization converts an ordinary clocked design into LI. It uses communication protocols such as the Synchronous Elastic Flow (SELF). Comparing to its lazy implementations, eager SELF has no combinational cycles and can provide performance advantage. Yet, it uses eager forks (EForks) consuming more area and power. This paper demonstrates that EForks can be redundant. A novel ultra simple fork (USFork) implementation is introduced. The conditions under which an EFork will behave exactly the same as a USFork (from the protocol perspective) are formally derived. The paper also investigates the conditions under which multiple SELF controllers can be merged to further decrease the area and power overhead (as long as the physical placement allows). The flow has been integrated in a fully automated tool, HGEN. Hybrid GENerator (HGEN) selectively replaces redundant EForks with USForks and, optionally, merges equivalent controllers. HGEN uses 6thSense tool as an embedded verification engine. Comparing to the methodology used in published work on a MiniMIPS processor case study, HGEN shows up to 34.3% and 25.4% savings in area and power due to utilizing USForks. It also shows at least 32% saving in the number of EForks in s382 ISCAS benchmark. More reduction is possible if the physical placement allows for controller merging. Thanks to the advance in synchronous verification technology, HGEN runs within a few minutes (for all this paper examples). This makes the proposed approach suitable for tight time-to-market constraints.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"7 1","pages":"794-801"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82051795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Toward efficient spatial variation decomposition via sparse regression 基于稀疏回归的空间变异分解研究

2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105321

Wangyang Zhang, K. Balakrishnan, Xin Li, D. Boning, Rob A. Rutenbar

In this paper, we propose a new technique to accurately decompose process variation into two different components: (1) spatially correlated variation, and (2) uncorrelated random variation. Such variation decomposition is important to identify systematic variation patterns at wafer and/or chip level for process modeling, control and diagnosis. We demonstrate that spatially correlated variation carries a unique sparse signature in frequency domain. Based upon this observation, an efficient sparse regression algorithm is applied to accurately separate spatially correlated variation from uncorrelated random variation. An important contribution of this paper is to develop a fast numerical algorithm that reduces the computational time of sparse regression by several orders of magnitude over the traditional implementation. Our experimental results based on silicon measurement data demonstrate that the proposed sparse regression technique can capture spatially correlated variation patterns with high accuracy. The estimation error is reduced by more than 3.5× compared to other traditional methods.

本文提出了一种将过程变化精确分解为两个不同分量的新技术:(1)空间相关变化和(2)不相关随机变化。这种变异分解对于在晶圆和/或芯片水平上识别系统变异模式对于过程建模、控制和诊断非常重要。我们证明了空间相关变异在频域具有独特的稀疏特征。在此基础上，采用一种高效的稀疏回归算法，将空间相关变异与不相关随机变异精确分离。本文的一个重要贡献是开发了一种快速的数值算法，使稀疏回归的计算时间比传统的实现减少了几个数量级。基于硅测量数据的实验结果表明，稀疏回归技术可以高精度地捕获空间相关的变化模式。与其他传统方法相比，估计误差降低了3.5倍以上。

{"title":"Toward efficient spatial variation decomposition via sparse regression","authors":"Wangyang Zhang, K. Balakrishnan, Xin Li, D. Boning, Rob A. Rutenbar","doi":"10.1109/ICCAD.2011.6105321","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105321","url":null,"abstract":"In this paper, we propose a new technique to accurately decompose process variation into two different components: (1) spatially correlated variation, and (2) uncorrelated random variation. Such variation decomposition is important to identify systematic variation patterns at wafer and/or chip level for process modeling, control and diagnosis. We demonstrate that spatially correlated variation carries a unique sparse signature in frequency domain. Based upon this observation, an efficient sparse regression algorithm is applied to accurately separate spatially correlated variation from uncorrelated random variation. An important contribution of this paper is to develop a fast numerical algorithm that reduces the computational time of sparse regression by several orders of magnitude over the traditional implementation. Our experimental results based on silicon measurement data demonstrate that the proposed sparse regression technique can capture spatially correlated variation patterns with high accuracy. The estimation error is reduced by more than 3.5× compared to other traditional methods.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"61 1","pages":"162-169"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80283932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Feedback control based cache reliability enhancement for emerging multicores 基于反馈控制的新兴多核缓存可靠性增强

2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105305

Hui Zhao, Akbar Sharifi, Shekhar Srikantaiah, M. Kandemir

Focusing on data reliability, we propose a control theory centric approach designed to improve transient error resilience in shared caches of emerging multicores while satisfying performance goals. The proposed scheme takes, as input, two quality of service (QoS) specifications: performance QoS and reliability QoS. The first of these indicates the minimum workload-wide cache (L2) hit rate value acceptable, whereas the second one captures the reliability bound on an application basis, with the help of a metric called the Reads-with-Replica (RwR). We present an extensive experimental evaluation of the proposed scheme on various workloads formed using the applications from the SPEC2006 benchmark suite. The proposed scheme is able to satisfy, in most of the tested cases, both performance and reliability QoS targets, by successfully modulating the total size of the data replication area and partitioning of this area among the co-runner applications. The collected results also show that our scheme achieves consistent improvements under different values of the major simulation parameters.

在数据可靠性方面，我们提出了一种以控制理论为中心的方法，旨在提高新兴多核共享缓存的瞬态错误恢复能力，同时满足性能目标。该方案以两种服务质量(QoS)规范作为输入:性能QoS和可靠性QoS。其中第一个指示可接受的工作负载范围内的最小缓存(L2)命中率值，而第二个则在称为带副本读取(RwR)的度量的帮助下，在应用程序的基础上捕获可靠性界限。我们在使用SPEC2006基准测试套件中的应用程序形成的各种工作负载上对所提出的方案进行了广泛的实验评估。在大多数测试案例中，该方案通过成功地调节数据复制区域的总大小并在协同运行应用程序之间对该区域进行分区，能够满足性能和可靠性QoS目标。收集的结果还表明，在不同的主要仿真参数值下，我们的方案取得了一致的改进。

{"title":"Feedback control based cache reliability enhancement for emerging multicores","authors":"Hui Zhao, Akbar Sharifi, Shekhar Srikantaiah, M. Kandemir","doi":"10.1109/ICCAD.2011.6105305","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105305","url":null,"abstract":"Focusing on data reliability, we propose a control theory centric approach designed to improve transient error resilience in shared caches of emerging multicores while satisfying performance goals. The proposed scheme takes, as input, two quality of service (QoS) specifications: performance QoS and reliability QoS. The first of these indicates the minimum workload-wide cache (L2) hit rate value acceptable, whereas the second one captures the reliability bound on an application basis, with the help of a metric called the Reads-with-Replica (RwR). We present an extensive experimental evaluation of the proposed scheme on various workloads formed using the applications from the SPEC2006 benchmark suite. The proposed scheme is able to satisfy, in most of the tested cases, both performance and reliability QoS targets, by successfully modulating the total size of the data replication area and partitioning of this area among the co-runner applications. The collected results also show that our scheme achieves consistent improvements under different values of the major simulation parameters.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"22 6 1","pages":"56-62"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82919178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Combined loop transformation and hierarchy allocation for data reuse optimization 基于循环转换和分层分配的数据重用优化

2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105324

J. Cong, Peng Zhang, Yi Zou

External memory bandwidth is a crucial bottleneck in the majority of computation-intensive applications for both performance and power consumption. Data reuse is an important technique for reducing the external memory access by utilizing the memory hierarchy. Loop transformation for data locality and memory hierarchy allocation are two major steps in data reuse optimization flow. But they were carried out independently. This paper presents a combined approach which optimizes loop transformation and memory hierarchy allocation simultaneously to achieve global optimal results on external memory bandwidth and on-chip data reuse buffer size. We develop an efficient and optimal solution to the combined problem by decomposing the solution space into two subspaces with linear and nonlinear constraints respectively. We show that we can significantly prune the solution space without losing its optimality. Experimental results show that our scheme can save up to 31% of on-chip memory size compared to the separated two-step method when the memory hierarchy allocation problem is not trivial. Also, run-time complexity is acceptable for the practical cases.

在大多数计算密集型应用程序中，外部内存带宽是性能和功耗的关键瓶颈。数据重用是利用内存层次结构减少外部内存访问的一项重要技术。数据局部性循环转换和内存层次分配是数据重用优化流程中的两个主要步骤。但它们都是独立进行的。本文提出了一种同时优化循环转换和内存层次分配的组合方法，以达到外部存储器带宽和片上数据重用缓冲区大小的全局最优结果。通过将组合问题的解空间分解为分别具有线性和非线性约束的两个子空间，得到了一个有效的最优解。我们证明了我们可以在不失去其最优性的情况下显著地修剪解空间。实验结果表明，在内存层次分配问题不严重的情况下，该方案与分步法相比可节省31%的片上内存大小。此外，对于实际情况，运行时复杂性是可以接受的。

{"title":"Combined loop transformation and hierarchy allocation for data reuse optimization","authors":"J. Cong, Peng Zhang, Yi Zou","doi":"10.1109/ICCAD.2011.6105324","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105324","url":null,"abstract":"External memory bandwidth is a crucial bottleneck in the majority of computation-intensive applications for both performance and power consumption. Data reuse is an important technique for reducing the external memory access by utilizing the memory hierarchy. Loop transformation for data locality and memory hierarchy allocation are two major steps in data reuse optimization flow. But they were carried out independently. This paper presents a combined approach which optimizes loop transformation and memory hierarchy allocation simultaneously to achieve global optimal results on external memory bandwidth and on-chip data reuse buffer size. We develop an efficient and optimal solution to the combined problem by decomposing the solution space into two subspaces with linear and nonlinear constraints respectively. We show that we can significantly prune the solution space without losing its optimality. Experimental results show that our scheme can save up to 31% of on-chip memory size compared to the separated two-step method when the memory hierarchy allocation problem is not trivial. Also, run-time complexity is acceptable for the practical cases.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"1 1","pages":"185-192"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88183370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Electromigration modeling and full-chip reliability analysis for BEOL interconnect in TSV-based 3D ICs 基于tsv的三维集成电路中BEOL互连的电迁移建模及全片可靠性分析

2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105385

M. Pathak, Jiwoo Pak, D. Pan, S. Lim

Electromigration (EM) is a critical problem for interconnect reliability of modern integrated circuits (ICs), especially as the feature size becomes smaller. In three-dimensional (3D) IC technology, the EM problem becomes more severe due to drastic dimension mismatches between metal wires, through silicon vias (TSVs), and landing pads. Meanwhile, the thermo-mechanical stress due to the TSV can also cause reduction in the failure time of wires. However, there is very little study on EM issues that consider TSVs in 3D ICs. In this paper, we show the impact of TSV stress on EM failure time of metal wires in 3D ICs. We model the impact of TSV on stress variation in wires. We then perform detailed modeling of the impact of stress on EM failure time of metal wires. Based on our analysis, we build a detailed library to predict the failure time of a given wire based on current density, temperature and stress. We then propose a method to perform fast full-chip simulation, to determine the various EM related hot-spots in the design. We also propose a simple routing-blockage scheme to reduce the EM related failures near the TSVs, and see its impact on various metrics.

电迁移(EM)是影响现代集成电路互连可靠性的关键问题，特别是随着特征尺寸的减小。在三维(3D)集成电路技术中，由于金属线、硅通孔(tsv)和着陆垫之间的尺寸严重不匹配，EM问题变得更加严重。同时，由于TSV引起的热机械应力也可以缩短导线的失效时间。然而，在3D集成电路中考虑tsv的EM问题的研究很少。在本文中，我们展示了TSV应力对三维集成电路中金属导线电磁失效时间的影响。我们模拟了TSV对导线应力变化的影响。然后，我们对应力对金属丝电磁破坏时间的影响进行了详细的建模。基于我们的分析，我们建立了一个详细的库，可以根据电流密度、温度和应力来预测给定导线的失效时间。然后，我们提出了一种方法来执行快速全芯片仿真，以确定设计中各种与EM相关的热点。我们还提出了一种简单的路由阻塞方案，以减少tsv附近的EM相关故障，并观察其对各种指标的影响。

{"title":"Electromigration modeling and full-chip reliability analysis for BEOL interconnect in TSV-based 3D ICs","authors":"M. Pathak, Jiwoo Pak, D. Pan, S. Lim","doi":"10.1109/ICCAD.2011.6105385","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105385","url":null,"abstract":"Electromigration (EM) is a critical problem for interconnect reliability of modern integrated circuits (ICs), especially as the feature size becomes smaller. In three-dimensional (3D) IC technology, the EM problem becomes more severe due to drastic dimension mismatches between metal wires, through silicon vias (TSVs), and landing pads. Meanwhile, the thermo-mechanical stress due to the TSV can also cause reduction in the failure time of wires. However, there is very little study on EM issues that consider TSVs in 3D ICs. In this paper, we show the impact of TSV stress on EM failure time of metal wires in 3D ICs. We model the impact of TSV on stress variation in wires. We then perform detailed modeling of the impact of stress on EM failure time of metal wires. Based on our analysis, we build a detailed library to predict the failure time of a given wire based on current density, temperature and stress. We then propose a method to perform fast full-chip simulation, to determine the various EM related hot-spots in the design. We also propose a simple routing-blockage scheme to reduce the EM related failures near the TSVs, and see its impact on various metrics.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"15 1","pages":"555-562"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88269149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

Automatic formal verification of multithreaded pipelined microprocessors 多线程流水线微处理器的自动形式化验证

2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105403

M. Velev, Ping Gao

We present highly automatic techniques for formal verification of pipelined microprocessors with hardware support for multithreading. The processors are modeled at a high level of abstraction, using a subset of Verilog, in a way that allows us to exploit the property of Positive Equality that results in significant simplifications of the solution space, and orders of magnitude speedup relative to previous methods. We propose abstraction techniques that produce at least 3 orders of magnitude speedup, which is increasing with the number of threads implemented in a pipelined processor. To the best of our knowledge, this is the first work on automatic formal verification of pipelined processors with hardware support for multithreading.

我们提出了高度自动化的技术来正式验证具有多线程硬件支持的流水线微处理器。处理器在高层次的抽象上建模，使用Verilog的一个子集，在某种程度上允许我们利用正相等的性质，从而大大简化了解空间，并且相对于以前的方法有了数量级的加速。我们提出的抽象技术可以产生至少3个数量级的加速，并且随着流水线处理器中实现的线程数量的增加而增加。据我们所知，这是第一个对具有多线程硬件支持的流水线处理器进行自动形式化验证的工作。

引用次数: 11

Useful-skew clock optimization for multi-power mode designs 多功率模式设计有用的倾斜时钟优化

2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105398

Hsuan-Ming Chou, Hao Yu, Shih-Chieh Chang

Instead of minimizing clock skew, skew can be useful to improve circuit performance. However, it is difficult to apply useful skew to a design with complicated power modes. With only one clock tree, useful skew in one power mode may be harmful in another power mode. In this paper, we propose to use adjustable delay buffers (ADBs) to construct a tunable clock tree so that useful skew can be assigned for different power modes. Assuming positions of ADBs are determined, we assign delays of ADBs for each power mode by LP. Then a speedup theorem is proposed to greatly reduce LP inequalities. We also propose an efficient method to select positions of ADBs. Our experimental results show that average 99.45% inequities are decreased and an average performance improvement of 27.35% is obtained compared with commercial tool SOC Encounter™.

而不是最小化时钟偏差，偏差可以用来提高电路性能。然而，对于具有复杂功率模式的设计，很难应用有用的倾斜。只有一个时钟树，在一种电源模式下有用的倾斜可能在另一种电源模式下有害。在本文中，我们建议使用可调延迟缓冲器(ADBs)来构造一个可调时钟树，以便可以为不同的功率模式分配有用的倾斜。假设adb的位置已经确定，我们用LP来分配每个电源模式下adb的延迟。然后提出了一个加速定理，以极大地减少LP不等式。我们还提出了一种有效的选择adb位置的方法。我们的实验结果表明，与商用工具SOC Encounter™相比，平均减少了99.45%的不平等，平均性能提高了27.35%。

引用次数: 15

GPU programming for EDA with OpenCL 基于OpenCL的EDA图形处理器编程

2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105306

R. Topaloglu, Benedict R. Gaster

Graphical processing unit (GPU) computing has been an interesting area of research in the last few years. While initial adapters of the technology have been from image processing domain due to difficulties in programming the GPUs, research on programming languages made it possible for people without the knowledge of low-level programming languages such as OpenGL develop code on GPUs. Two main GPU architectures from AMD (former ATI) and NVIDIA acquired grounds. AMD adapted Stanford's Brook language and made it into an architecture-agnostic programming model. NVIDIA, on the other hand, brought CUDA framework to a wide audience. While the two languages have their pros and cons, such as Brook not being able to scale as well and CUDA having to account for architectural-level decisions, it has not been possible to compile one code on another architecture or across platforms. Another opportunity came with the introduction of the idea of combining one or more CPUs and GPUs on the same die. Eliminating some of the interconnection bandwidth issues, this combination makes it possible to offload tasks with high parallelism to the GPU. The technological direction towards multicores for CPU-only architectures also require a programming methodology change and act as a catalyst for suitable programming languages. Hence, a unified language that can be used both on multiple core CPUs as well as GPUs and their combinations has gained interest. Open Computing Language (OpenCL), developed originally by the Khronos Group of Apple and supported by both AMD and NVIDIA, is seen as the programming language of choice for parallel programming. In this paper, we provide a motivation for our tutorial talk on usage of OpenCL for GPUs and highlight key features of the language. We provide research directions on OpenCL for EDA. In our tutorial talk, we use EDA as our application domain to get the readers started with programming the rising language of parallelism, OpenCL.

图形处理单元(GPU)计算在过去几年中一直是一个有趣的研究领域。虽然由于gpu编程困难，该技术的最初适配器已经从图像处理领域开始，但编程语言的研究使得没有底层编程语言(如OpenGL)知识的人可以在gpu上开发代码。来自AMD(前ATI)和NVIDIA的两种主要GPU架构获得了支持。AMD改编了斯坦福大学的Brook语言，并使其成为一种与架构无关的编程模型。另一方面，NVIDIA将CUDA框架带给了广泛的受众。虽然这两种语言各有利弊，比如Brook不能很好地扩展，CUDA必须考虑架构级别的决策，但在另一种架构或跨平台上编译一种代码是不可能的。另一个机会是在同一个芯片上组合一个或多个cpu和gpu的想法。消除了一些互连带宽问题，这种组合使得将具有高并行性的任务卸载到GPU成为可能。面向仅cpu架构的多核技术方向也需要改变编程方法，并作为合适编程语言的催化剂。因此，可以在多核cpu和gpu及其组合上使用的统一语言引起了人们的兴趣。开放计算语言(OpenCL)最初由苹果公司的Khronos集团开发，并得到AMD和NVIDIA的支持，被视为并行编程的首选编程语言。在本文中，我们提供了关于OpenCL用于gpu的教程演讲的动机，并强调了该语言的关键特性。提出了面向EDA的OpenCL的研究方向。在我们的教程演讲中，我们使用EDA作为我们的应用领域，让读者开始对正在兴起的并行语言OpenCL进行编程。

{"title":"GPU programming for EDA with OpenCL","authors":"R. Topaloglu, Benedict R. Gaster","doi":"10.1109/ICCAD.2011.6105306","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105306","url":null,"abstract":"Graphical processing unit (GPU) computing has been an interesting area of research in the last few years. While initial adapters of the technology have been from image processing domain due to difficulties in programming the GPUs, research on programming languages made it possible for people without the knowledge of low-level programming languages such as OpenGL develop code on GPUs. Two main GPU architectures from AMD (former ATI) and NVIDIA acquired grounds. AMD adapted Stanford's Brook language and made it into an architecture-agnostic programming model. NVIDIA, on the other hand, brought CUDA framework to a wide audience. While the two languages have their pros and cons, such as Brook not being able to scale as well and CUDA having to account for architectural-level decisions, it has not been possible to compile one code on another architecture or across platforms. Another opportunity came with the introduction of the idea of combining one or more CPUs and GPUs on the same die. Eliminating some of the interconnection bandwidth issues, this combination makes it possible to offload tasks with high parallelism to the GPU. The technological direction towards multicores for CPU-only architectures also require a programming methodology change and act as a catalyst for suitable programming languages. Hence, a unified language that can be used both on multiple core CPUs as well as GPUs and their combinations has gained interest. Open Computing Language (OpenCL), developed originally by the Khronos Group of Apple and supported by both AMD and NVIDIA, is seen as the programming language of choice for parallel programming. In this paper, we provide a motivation for our tutorial talk on usage of OpenCL for GPUs and highlight key features of the language. We provide research directions on OpenCL for EDA. In our tutorial talk, we use EDA as our application domain to get the readers started with programming the rising language of parallelism, OpenCL.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"24 1","pages":"63-66"},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91121458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Algorithmic tuning of clock trees and derived non-tree structures 时钟树的算法调整和派生的非树结构

2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pub Date : 2011-11-07 DOI: 10.1109/ICCAD.2011.6105342

I. Markov, Dongjin Lee

This mini-tutorial covers recent research on clock-network tuning. It starts with SPICE-accurate optimizations used in winning entries at the ISPD 2009 and 2010 clock-network synthesis contests. After comparing clock trees to meshes, it outlines a recent redundant clock-network topology that retains most advantages of clock trees, but improves robustness to PVT variations. It also shows how to incorporate clock-network synthesis into global placement to reduce dynamic power and insertion delay.

这个迷你教程涵盖了时钟网络调优的最新研究。首先，在ISPD 2009年和2010年时钟网络合成竞赛中获胜的作品中使用了spice精确的优化。在将时钟树与网格进行比较后，它概述了一种最新的冗余时钟网络拓扑结构，它保留了时钟树的大多数优点，但提高了对PVT变化的鲁棒性。它还展示了如何将时钟网络合成集成到全局布局中，以降低动态功耗和插入延迟。

引用次数: 6

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀