首页 > 最新文献

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)最新文献

英文 中文
Design of accurate stochastic number generators with noisy emerging devices for stochastic computing 随机计算中带噪声新兴器件的精确随机数发生器设计
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203837
Meng Yang, J. Hayes, Deliang Fan, Weikang Qian
Stochastic computing (SC) is an unconventional computing paradigm that operates on stochastic bit streams. It has gained attention recently because of the very low area and power needs of its computing core. SC relies on stochastic number generators (SNGs) to map input binary numbers to stochastic bit streams. A conventional SNG comprises a random number source (RNS), typically an LFSR, and a comparator. It needs far more area and power than the SC core, offsetting the latter's main advantages. To mitigate this problem, SNGs employing emerging nanoscale devices such as memristors and spintronic devices have been proposed. However, these devices tend to have large errors in their output probabilities due to unpredictable variations in their fabrication processes and noise in their control signals. We present a novel method of exploiting such devices to design a highly accurate SNG. It is built around an RNS that generates uniformly distributed random numbers under ideal (nominal) conditions. It also has a novel error-cancelling probability conversion circuit (ECPCC) that guarantees very high accuracy in the output probability under realistic conditions when the RNS is subject to errors. An ECPCC can also be used to generate maximally correlated stochastic streams, a useful property for some applications.
随机计算(SC)是一种基于随机比特流的非常规计算范式。由于其计算核心的面积和功率需求非常低,最近引起了人们的关注。SC依靠随机数字生成器(sng)将输入二进制数映射到随机比特流。传统的SNG包括一个随机数源(RNS),通常是一个LFSR和一个比较器。它需要比SC核心更多的面积和功率,抵消了后者的主要优势。为了缓解这一问题,人们提出了采用新兴纳米级器件(如忆阻器和自旋电子器件)的sng。然而,由于制造过程中不可预测的变化和控制信号中的噪声,这些器件的输出概率往往存在较大误差。我们提出了一种利用这种装置来设计高精度煤制煤的新方法。它是围绕RNS构建的,该RNS在理想(名义)条件下生成均匀分布的随机数。它还具有一种新颖的误差消除概率转换电路(ECPCC),可以保证在RNS存在误差的实际情况下,输出概率具有很高的精度。ECPCC还可用于生成最大相关随机流,这在某些应用中是一个有用的特性。
{"title":"Design of accurate stochastic number generators with noisy emerging devices for stochastic computing","authors":"Meng Yang, J. Hayes, Deliang Fan, Weikang Qian","doi":"10.1109/ICCAD.2017.8203837","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203837","url":null,"abstract":"Stochastic computing (SC) is an unconventional computing paradigm that operates on stochastic bit streams. It has gained attention recently because of the very low area and power needs of its computing core. SC relies on stochastic number generators (SNGs) to map input binary numbers to stochastic bit streams. A conventional SNG comprises a random number source (RNS), typically an LFSR, and a comparator. It needs far more area and power than the SC core, offsetting the latter's main advantages. To mitigate this problem, SNGs employing emerging nanoscale devices such as memristors and spintronic devices have been proposed. However, these devices tend to have large errors in their output probabilities due to unpredictable variations in their fabrication processes and noise in their control signals. We present a novel method of exploiting such devices to design a highly accurate SNG. It is built around an RNS that generates uniformly distributed random numbers under ideal (nominal) conditions. It also has a novel error-cancelling probability conversion circuit (ECPCC) that guarantees very high accuracy in the output probability under realistic conditions when the RNS is subject to errors. An ECPCC can also be used to generate maximally correlated stochastic streams, a useful property for some applications.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126738908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A novel damped-wave framework for macro placement 一种新的宏放置阻尼波框架
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203819
Chin-Hao Chang, Yao-Wen Chang, Tung-Chieh Chen
In this paper, we present a damped-wave constructive macro placement framework which packs big macros to optimize both wirelength and routability simultaneously. Unlike traditional V-shaped and Λ-shaped multilevel frameworks which might lack respective local and global information during processing, our dampedwave framework considers both local and global information by the following two major techniques: (1) macro clustering to improve scalability, and (2) constructive macros declustering to assist a standard-cell placer to obtain better solutions. We also present a macro-grouping cost model to remedy the key drawback of ignoring the mismatches of standard-cell locations between the prototyping and the final standard-cell placement stages in existing three-stage mixed-size placers (containing prototyping, macro placement, and standard cell placement). We further propose the regularity penalty model to guide macros to form an integral, regular region during macro placement, facilitating the succeeding placement of standard cell. Compared with manual placement from industrial and a leading mixed-size placer, experimental results show that our damped-wave multilevel framework and cost models are efficient and effective in reducing half-perimeter wirelength and routed wirelength and overflows. In particular, our work provides a new research direction on effective frameworks for large-scale designs, which readily apply to many optimization problems limited with scalability.
在本文中,我们提出了一个阻尼波构造宏放置框架,该框架封装了大宏以同时优化无线和可达性。不同于传统的v形和Λ-shaped多层框架在处理过程中可能缺乏局部和全局信息,我们的阻尼波框架通过以下两种主要技术考虑局部和全局信息:(1)宏聚类以提高可扩展性;(2)建设性宏聚类以帮助标准单元放置器获得更好的解决方案。我们还提出了一个宏观分组成本模型,以弥补在现有的三阶段混合大小的砂矿(包括原型、宏观放置和标准放置)中忽略标准单元位置在原型和最终标准单元放置阶段之间不匹配的关键缺陷。我们进一步提出了规则惩罚模型,以引导宏在宏放置过程中形成一个完整的规则区域,便于标准单元的后续放置。实验结果表明,我们的阻尼波多层框架和成本模型在减少半周长、路由长度和溢出方面是有效的。特别是,我们的工作为大规模设计的有效框架提供了一个新的研究方向,它很容易应用于许多受可扩展性限制的优化问题。
{"title":"A novel damped-wave framework for macro placement","authors":"Chin-Hao Chang, Yao-Wen Chang, Tung-Chieh Chen","doi":"10.1109/ICCAD.2017.8203819","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203819","url":null,"abstract":"In this paper, we present a damped-wave constructive macro placement framework which packs big macros to optimize both wirelength and routability simultaneously. Unlike traditional V-shaped and Λ-shaped multilevel frameworks which might lack respective local and global information during processing, our dampedwave framework considers both local and global information by the following two major techniques: (1) macro clustering to improve scalability, and (2) constructive macros declustering to assist a standard-cell placer to obtain better solutions. We also present a macro-grouping cost model to remedy the key drawback of ignoring the mismatches of standard-cell locations between the prototyping and the final standard-cell placement stages in existing three-stage mixed-size placers (containing prototyping, macro placement, and standard cell placement). We further propose the regularity penalty model to guide macros to form an integral, regular region during macro placement, facilitating the succeeding placement of standard cell. Compared with manual placement from industrial and a leading mixed-size placer, experimental results show that our damped-wave multilevel framework and cost models are efficient and effective in reducing half-perimeter wirelength and routed wirelength and overflows. In particular, our work provides a new research direction on effective frameworks for large-scale designs, which readily apply to many optimization problems limited with scalability.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114942092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Dynamic partitioning to mitigate stuck-at faults in emerging memories 动态分区,以减轻新内存中的卡住故障
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203839
Jiangwei Zhang, Donald Kline, Liang Fang, R. Melhem, A. Jones
Emerging non-volatile memories have many advantages over conventional memory. Unfortunately, many are susceptible to write endurance challenges, resulting in stuck-at faults. Existing mitigation methods statically partition and invert data within a block containing such faults (partition-and-flip) to ensure data is written to match stuck-at cells such that they may remain in service. Unfortunately, these schemes have limited fault tolerance capabilities and require the assumption that their auxiliary bits are fault free. We propose a dynamic partitioning scheme that improves the number of tolerated stuck-at faults and simultaneously protects auxiliary bits. Dynamic partitioning can significantly improve the fault tolerance over existing static partitioning approaches with an equal number of auxiliary bits. Moreover, it can often still improve fault tolerance while reducing the number of auxiliary bits. Compared to flip-N-write and Aegis, a leading mitigation scheme, dynamic partitioning can achieve 7–72% and 5–53 x lower write error rates, respectively, for the same capacity overhead with a stuck-at-fault rate of 10−3.
新兴的非易失性存储器比传统存储器有许多优点。不幸的是,许多都容易受到写入持久性挑战的影响,从而导致卡在故障上。现有的缓解方法静态地对包含此类故障的块内的数据进行分区和反转(分区和翻转),以确保将数据写入与卡住的单元匹配,从而使它们可以继续使用。不幸的是,这些方案的容错能力有限,并且需要假设它们的辅助位是无故障的。我们提出了一种动态分区方案,可以提高可容忍卡故障的数量,同时保护辅助位。与现有的静态分区方法相比,使用相同数量的辅助位,动态分区可以显著提高容错性。此外,它还可以在减少辅助比特数的同时提高容错性。与flip-N-write和领先的缓解方案Aegis相比,在相同的容量开销下,动态分区的写错误率分别降低了7-72%和5-53倍,而卡错率为10−3。
{"title":"Dynamic partitioning to mitigate stuck-at faults in emerging memories","authors":"Jiangwei Zhang, Donald Kline, Liang Fang, R. Melhem, A. Jones","doi":"10.1109/ICCAD.2017.8203839","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203839","url":null,"abstract":"Emerging non-volatile memories have many advantages over conventional memory. Unfortunately, many are susceptible to write endurance challenges, resulting in stuck-at faults. Existing mitigation methods statically partition and invert data within a block containing such faults (partition-and-flip) to ensure data is written to match stuck-at cells such that they may remain in service. Unfortunately, these schemes have limited fault tolerance capabilities and require the assumption that their auxiliary bits are fault free. We propose a dynamic partitioning scheme that improves the number of tolerated stuck-at faults and simultaneously protects auxiliary bits. Dynamic partitioning can significantly improve the fault tolerance over existing static partitioning approaches with an equal number of auxiliary bits. Moreover, it can often still improve fault tolerance while reducing the number of auxiliary bits. Compared to flip-N-write and Aegis, a leading mitigation scheme, dynamic partitioning can achieve 7–72% and 5–53 x lower write error rates, respectively, for the same capacity overhead with a stuck-at-fault rate of 10−3.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116395078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Accelerating functional timing analysis with encoding duplication removal and redundant state propagation 利用编码重复去除和冗余状态传播加速功能时序分析
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203768
D. Wu, Pin-Ru Jhao, Charles H.-P. Wen
Functional timing analysis (FTA) emerges for better timing closure than static timing analysis (STA) by providing the true delay of the circuit as well as its input pattern. For Satisfiability(SAT)-based FTA, a search problem for circuit delay can be expressed by clauses corresponding to circuit consistency function (CCF) and timed characteristic function (TCF). In particular, the clause number tends to grow exponentially as the circuit size increases, lengthening runtime for FTA. However, when formulating TCF, numerous clauses and literals are found useless. Therefore, two key techniques are proposed: (1) Encoding Duplication Removal (EDR) for removing those literals that are previously encoded in CCF but now duplicated in TCF, and (2) Redundant State Propagation (RSP) for propagating redundant states of nodes to help prune TCF clauses. Experiments indicate that under the worst-case delay of each benchmark circuit, EDR and RSP successfully reduce averagely 49% of clauses, 65% of literals, and 52% runtime on seven benchmark circuits for FTA.
功能时序分析(FTA)通过提供电路的真实延迟及其输入模式,比静态时序分析(STA)具有更好的时序封闭性。对于基于可满足性(SAT)的自由贸易区,电路延迟的搜索问题可以用电路一致性函数(CCF)和时间特征函数(TCF)对应的子句来表示。特别是,随着电路尺寸的增加,条款数呈指数级增长,从而延长了FTA的运行时间。然而,在制定TCF时,发现许多子句和文字是无用的。因此,本文提出了两个关键技术:(1)编码重复去除(Encoding Duplication Removal, EDR),用于去除之前在CCF中编码但现在在TCF中重复的文字;(2)冗余状态传播(Redundant State Propagation, RSP),用于传播节点的冗余状态,以帮助修剪TCF子句。实验表明,在每个基准电路的最坏延迟情况下,EDR和RSP成功地在7个FTA基准电路上平均减少49%的子句,65%的字面量和52%的运行时间。
{"title":"Accelerating functional timing analysis with encoding duplication removal and redundant state propagation","authors":"D. Wu, Pin-Ru Jhao, Charles H.-P. Wen","doi":"10.1109/ICCAD.2017.8203768","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203768","url":null,"abstract":"Functional timing analysis (FTA) emerges for better timing closure than static timing analysis (STA) by providing the true delay of the circuit as well as its input pattern. For Satisfiability(SAT)-based FTA, a search problem for circuit delay can be expressed by clauses corresponding to circuit consistency function (CCF) and timed characteristic function (TCF). In particular, the clause number tends to grow exponentially as the circuit size increases, lengthening runtime for FTA. However, when formulating TCF, numerous clauses and literals are found useless. Therefore, two key techniques are proposed: (1) Encoding Duplication Removal (EDR) for removing those literals that are previously encoded in CCF but now duplicated in TCF, and (2) Redundant State Propagation (RSP) for propagating redundant states of nodes to help prune TCF clauses. Experiments indicate that under the worst-case delay of each benchmark circuit, EDR and RSP successfully reduce averagely 49% of clauses, 65% of literals, and 52% runtime on seven benchmark circuits for FTA.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128520883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fast physics-based electromigration assessment by efficient solution of linear time-invariant (LTI) systems 线性时不变(LTI)系统有效解的快速物理电迁移评估
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203840
S. Chatterjee, V. Sukharev, F. Najm
Electromigration (EM) is a key reliability concern in chip power/ ground (p/g) grids, which has been exacerbated by the high current levels and narrow metal lines in modern grids. EM checking is expensive due to the large sizes of modern p/g grids and is also inherently difficult due to the complex nature of the EM phenomenon. Traditional EM checking, based on empirical models, cannot capture the complexity of EM and better models are needed for accurate prediction. Thus, recent physics-based EM models have been proposed, which remain computationally expensive because they require solution of a system of partial differential equations (PDEs). In this paper, we propose a fast and scalable methodology for power grid EM verification, building on previous physics-based models. We first convert the PDE system to a succession of homogeneous linear time invariant (LTI) systems. Because these systems are found to be stiff, we numerically integrate them using optimized variable-step backward differentiation formulas (BDFs). Our method, for a number of IBM power grids and internal benchmarks, achieves an average speed-up of over 20x as compared to previously published work and has a runtime of only about 8 minutes for a 4 million node grid.
电迁移(EM)是芯片电源/地(p/g)电网的关键可靠性问题,现代电网中的高电流水平和窄金属线加剧了这一问题。由于现代p/g网格的大尺寸,电磁检测成本高昂,而且由于电磁现象的复杂性,电磁检测本身也很困难。传统的基于经验模型的电磁检测无法捕捉电磁的复杂性,需要更好的模型才能进行准确的预测。因此,最近提出了基于物理的电磁模型,由于它们需要求解偏微分方程(PDEs)系统,因此计算成本很高。在本文中,我们提出了一种快速和可扩展的方法,用于电网电磁验证,建立在以前的基于物理的模型。我们首先将PDE系统转化为一系列齐次线性时不变(LTI)系统。由于这些系统被发现是刚性的,我们使用优化的变步长后向微分公式(bdf)对它们进行数值积分。对于许多IBM电网和内部基准测试,我们的方法与之前发布的工作相比,实现了超过20倍的平均加速,并且对于400万个节点网格,运行时间仅为8分钟左右。
{"title":"Fast physics-based electromigration assessment by efficient solution of linear time-invariant (LTI) systems","authors":"S. Chatterjee, V. Sukharev, F. Najm","doi":"10.1109/ICCAD.2017.8203840","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203840","url":null,"abstract":"Electromigration (EM) is a key reliability concern in chip power/ ground (p/g) grids, which has been exacerbated by the high current levels and narrow metal lines in modern grids. EM checking is expensive due to the large sizes of modern p/g grids and is also inherently difficult due to the complex nature of the EM phenomenon. Traditional EM checking, based on empirical models, cannot capture the complexity of EM and better models are needed for accurate prediction. Thus, recent physics-based EM models have been proposed, which remain computationally expensive because they require solution of a system of partial differential equations (PDEs). In this paper, we propose a fast and scalable methodology for power grid EM verification, building on previous physics-based models. We first convert the PDE system to a succession of homogeneous linear time invariant (LTI) systems. Because these systems are found to be stiff, we numerically integrate them using optimized variable-step backward differentiation formulas (BDFs). Our method, for a number of IBM power grids and internal benchmarks, achieves an average speed-up of over 20x as compared to previously published work and has a runtime of only about 8 minutes for a 4 million node grid.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130513514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Exploring the exponential integrators with Krylov subspace algorithms for nonlinear circuit simulation 利用Krylov子空间算法探索非线性电路仿真中的指数积分器
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203774
Xinyuan Wang, Hao Zhuang, Chung-Kuan Cheng
We explore Krylov subspace algorithms to calculate ϕ functions of exponential integrators for circuit simulation. Higham [1] pointed out the potential numerical stability risk of ϕ functions computation. However, for the applications to circuit analysis, the choice of methods remains open. This work inspects the accuracy of matrix exponential and vector product with Krylov subspace methods, and identifies the proper approach to achieving numerically stable solutions for nonlinear circuits. Empirial results verify the quality of the proposed methods using various orders of ϕ functions. Furthermore, instead of Newton-Raphson (NR) iterations in conventional methods, an iterative residue correction algorithm is devised for nonlinear system analysis. The stability and efficiency of our methods are illustrated with experiments.
我们探索了Krylov子空间算法来计算电路仿真中指数积分器的ϕ函数。Higham[1]指出了φ函数计算的潜在数值稳定性风险。然而,对于电路分析的应用,方法的选择仍然是开放的。本文用Krylov子空间方法检验了矩阵指数和向量积的准确性,并确定了实现非线性电路数值稳定解的适当方法。实验结果验证了使用不同阶的ϕ函数所提出方法的质量。在此基础上,提出了一种基于迭代残差校正的非线性系统分析算法,取代了传统方法中的牛顿-拉夫森迭代法。实验证明了该方法的稳定性和有效性。
{"title":"Exploring the exponential integrators with Krylov subspace algorithms for nonlinear circuit simulation","authors":"Xinyuan Wang, Hao Zhuang, Chung-Kuan Cheng","doi":"10.1109/ICCAD.2017.8203774","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203774","url":null,"abstract":"We explore Krylov subspace algorithms to calculate ϕ functions of exponential integrators for circuit simulation. Higham [1] pointed out the potential numerical stability risk of ϕ functions computation. However, for the applications to circuit analysis, the choice of methods remains open. This work inspects the accuracy of matrix exponential and vector product with Krylov subspace methods, and identifies the proper approach to achieving numerically stable solutions for nonlinear circuits. Empirial results verify the quality of the proposed methods using various orders of ϕ functions. Furthermore, instead of Newton-Raphson (NR) iterations in conventional methods, an iterative residue correction algorithm is devised for nonlinear system analysis. The stability and efficiency of our methods are illustrated with experiments.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127001442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Stress-aware performance evaluation of 3D-stacked wide I/O DRAMs 3d堆叠宽I/O dram的应力感知性能评估
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203838
Tengtao Li, S. Sapatnekar
3D-stacked wide I/O DRAM can significantly increase cell density and bandwidth while also lowering power consumption. However, 3D structures experience significant thermomechanical stress, which impacts circuit performance. This paper develops a procedure that performs a full performance analysis of 3D DRAMs, including latency, leakage power, refresh power, and area, while incorporating the effects of both layout-aware stress and layout-independent stress. The approach first proposes an analytic stress analysis method for the entire 3D DRAM structure, capturing the stress induced by TSVs, micro bumps, package bumps and warpage. Next, this stress is translated to variations in device mobility and threshold voltage, after which analytical models for latency, leakage power, and refresh power are derived. Finally, a complete analysis of performance variations is performed for various 3D DRAM layout configurations to assess the impact of layout-dependent stress.
3d堆叠的宽I/O DRAM可以显著提高单元密度和带宽,同时降低功耗。然而,3D结构经历显著的热机械应力,这影响电路的性能。本文开发了一个程序,对3D dram进行了全面的性能分析,包括延迟、泄漏功率、刷新功率和面积,同时结合了布局感知应力和布局无关应力的影响。该方法首先提出了一种针对整个3D DRAM结构的解析应力分析方法,捕获了tsv、微凸点、封装凸点和翘曲引起的应力。接下来,这种压力被转化为器件迁移率和阈值电压的变化,然后推导出延迟、泄漏功率和刷新功率的分析模型。最后,对各种3D DRAM布局配置的性能变化进行了完整的分析,以评估布局相关应力的影响。
{"title":"Stress-aware performance evaluation of 3D-stacked wide I/O DRAMs","authors":"Tengtao Li, S. Sapatnekar","doi":"10.1109/ICCAD.2017.8203838","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203838","url":null,"abstract":"3D-stacked wide I/O DRAM can significantly increase cell density and bandwidth while also lowering power consumption. However, 3D structures experience significant thermomechanical stress, which impacts circuit performance. This paper develops a procedure that performs a full performance analysis of 3D DRAMs, including latency, leakage power, refresh power, and area, while incorporating the effects of both layout-aware stress and layout-independent stress. The approach first proposes an analytic stress analysis method for the entire 3D DRAM structure, capturing the stress induced by TSVs, micro bumps, package bumps and warpage. Next, this stress is translated to variations in device mobility and threshold voltage, after which analytical models for latency, leakage power, and refresh power are derived. Finally, a complete analysis of performance variations is performed for various 3D DRAM layout configurations to assess the impact of layout-dependent stress.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126265224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fast physics-based electromigration analysis for multi-branch interconnect trees 基于物理的多分支互连树快速电迁移分析
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203775
Xiaoyi Wang, Yan Yan, Jian He, S. Tan, Chase Cook, Shengqi Yang
Electromigration (EM) becomes one of the most challenging reliability issues for current and future ICs in 10nm technology and below. In this paper, we propose a new analsys method for the EM hydrostatic stress evolution for multi-branch interconnect trees, which is the foundation of the EM reliability assessment for large scale on-chip interconnect networks, such as power grid networks. The proposed method, which is based on eigenfunctions technique, could efficiently calculate the hydrostatic stress evolution for multi-branch interconnect trees stressed with different current densities and non-uniformly distributed thermal effects. The new method can also accommodate the pre-existing residual stresses coming from thermal or other stress sources. The proposed method solves the partial differential equations of EM stress more efficiently since it does not require any discretization either spatially or temporall, which is in contrast to numerical methods such as finite difference method and finite element method. The accuracy of the proposed transient analysis approach is validated against the analytical solution and commercial tools. The efficiency of the proposed method is demonstrated and compared to finite difference method. The proposed method is 10X∼100X times faster than finite difference method and scales better for larger interconnect trees.
电迁移(EM)成为当前和未来10nm及以下工艺中ic最具挑战性的可靠性问题之一。本文提出了一种新的多分支互联树电磁静水应力演化分析方法,为大规模片上互联网络(如电网)电磁可靠性评估奠定了基础。该方法基于特征函数技术,可以有效地计算不同电流密度和非均匀分布热效应下多分支互连树的静水应力演化。新方法还可以适应来自热或其他应力源的预先存在的残余应力。与有限差分法和有限元法等数值方法相比,该方法不需要进行空间和时间上的离散化,可以更有效地求解电磁应力的偏微分方程。通过分析解和商业工具验证了所提出的瞬态分析方法的准确性。并与有限差分法进行了比较。该方法比有限差分方法快10倍~ 100倍,并且适用于更大的互连树。
{"title":"Fast physics-based electromigration analysis for multi-branch interconnect trees","authors":"Xiaoyi Wang, Yan Yan, Jian He, S. Tan, Chase Cook, Shengqi Yang","doi":"10.1109/ICCAD.2017.8203775","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203775","url":null,"abstract":"Electromigration (EM) becomes one of the most challenging reliability issues for current and future ICs in 10nm technology and below. In this paper, we propose a new analsys method for the EM hydrostatic stress evolution for multi-branch interconnect trees, which is the foundation of the EM reliability assessment for large scale on-chip interconnect networks, such as power grid networks. The proposed method, which is based on eigenfunctions technique, could efficiently calculate the hydrostatic stress evolution for multi-branch interconnect trees stressed with different current densities and non-uniformly distributed thermal effects. The new method can also accommodate the pre-existing residual stresses coming from thermal or other stress sources. The proposed method solves the partial differential equations of EM stress more efficiently since it does not require any discretization either spatially or temporall, which is in contrast to numerical methods such as finite difference method and finite element method. The accuracy of the proposed transient analysis approach is validated against the analytical solution and commercial tools. The efficiency of the proposed method is demonstrated and compared to finite difference method. The proposed method is 10X∼100X times faster than finite difference method and scales better for larger interconnect trees.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115882525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Thermal-sensitive design and power optimization for a 3D torus-based optical NoC 三维环基光学NoC的热敏设计与功率优化
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203863
Kang Yao, Yaoyao Ye, S. Pasricha, Jiang Xu
In order to overcome limitations of traditional electronic interconnects in terms of power efficiency and bandwidth density, optical networks-on-chip (NoCs) based on 3D integrated silicon photonics have been proposed as an emerging on-chip communication architecture for multiprocessor systems-on-chip (MPSoCs) with large core counts. However, due to thermo-optic effects, wavelength-selective silicon photonic devices such as microresonators, which are widely used in optical NoCs, suffer from temperature-dependent wavelength shifts. As a result, on-chip temperature variations cause significant thermal-induced optical power loss which may counteract the power advantages of optical NoCs. To tackle this problem, in this work, we present a thermal-sensitive design and power optimization approach for a 3D torus-based optical NoC architecture. Based on an optical thermal modeling platform which models the thermal effect in optical NoCs from a system-level perspective, a thermal-sensitive routing algorithm is proposed for the 3D torus-based optical NoC to optimize its power consumption in the presence of on-chip temperature variations. Simulation results show that in an 8×8×2 3D torus-based optical NoC under a set of real applications, as compared with a matched 3D mesh-based optical NoC with traditional dimension order routing, the power consumption is reduced by 25% if thermal tuning for microresonators is not utilized, by 19% if thermal tuning is utilized for microresonators, and by 17% if athermal microresonators are used.
为了克服传统电子互连在功率效率和带宽密度方面的局限性,提出了基于三维集成硅光子学的片上光网络(NoCs)作为具有大核数的多处理器片上系统(mpsoc)的新兴片上通信架构。然而,由于热光效应,波长选择性硅光子器件,如微谐振器,广泛用于光学noc,遭受温度依赖的波长偏移。因此,片上温度变化会导致显著的热致光功率损耗,这可能会抵消光noc的功率优势。为了解决这个问题,在这项工作中,我们提出了一种基于3D环面的光学NoC架构的热敏设计和功率优化方法。基于从系统级角度对光学NoC中的热效应进行建模的光学热建模平台,提出了一种基于三维环面的光学NoC的热敏路由算法,以优化其在片上温度变化情况下的功耗。仿真结果表明,在一组实际应用中,8×8×2三维环面光学NoC与采用传统尺寸顺序路由的匹配三维网格光学NoC相比,不使用微谐振器热调谐时功耗降低25%,使用微谐振器热调谐时功耗降低19%,使用非热微谐振器时功耗降低17%。
{"title":"Thermal-sensitive design and power optimization for a 3D torus-based optical NoC","authors":"Kang Yao, Yaoyao Ye, S. Pasricha, Jiang Xu","doi":"10.1109/ICCAD.2017.8203863","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203863","url":null,"abstract":"In order to overcome limitations of traditional electronic interconnects in terms of power efficiency and bandwidth density, optical networks-on-chip (NoCs) based on 3D integrated silicon photonics have been proposed as an emerging on-chip communication architecture for multiprocessor systems-on-chip (MPSoCs) with large core counts. However, due to thermo-optic effects, wavelength-selective silicon photonic devices such as microresonators, which are widely used in optical NoCs, suffer from temperature-dependent wavelength shifts. As a result, on-chip temperature variations cause significant thermal-induced optical power loss which may counteract the power advantages of optical NoCs. To tackle this problem, in this work, we present a thermal-sensitive design and power optimization approach for a 3D torus-based optical NoC architecture. Based on an optical thermal modeling platform which models the thermal effect in optical NoCs from a system-level perspective, a thermal-sensitive routing algorithm is proposed for the 3D torus-based optical NoC to optimize its power consumption in the presence of on-chip temperature variations. Simulation results show that in an 8×8×2 3D torus-based optical NoC under a set of real applications, as compared with a matched 3D mesh-based optical NoC with traditional dimension order routing, the power consumption is reduced by 25% if thermal tuning for microresonators is not utilized, by 19% if thermal tuning is utilized for microresonators, and by 17% if athermal microresonators are used.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131343451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
P4: Phase-based power/performance prediction of heterogeneous systems via neural networks P4:基于相位的异构系统功率/性能预测
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203843
Yeseong Kim, Pietro Mercati, A. More, Emily J. Shriver, T. Simunic
The emergence of Internet of Things increases the complexity and the heterogeneity of computing platforms. Migrating workload between various platforms is one way to improve both energy efficiency and performance. Effective migration decisions require accurate estimates of its costs and benefits. To date, these estimates were done by either instrumenting the source code/binaries, thus causing high overhead, or by using power estimates from hardware performance counters, which work well for individual machines, but until now have not been accurate for predicting across different architectures. In this paper, we propose P4, a new Phase-based Power and Performance Prediction framework which identifies cross-platform application power and performance at runtime for heterogeneous computing systems. P4 analyzes and detects machine-independent application phases by characterizing computing platforms offline with a set of benchmarks, and then builds neural network-based models to automatically identify and generalize the complex cross-platform relationships for each benchmark phase. It then leverages these models along with performance counter measurements collected at runtime to estimate performance and power consumption if it were running on a completely different computing platform, including a different CPU architecture, without ever having to run it on there. We evaluate the proposed framework on four commercial heterogeneous platforms, ranging from X86 servers to mobile ARM-based architecture, with 129 industry-standard benchmarks. Our experimental results show that P4 can predict the power and performance changes with only 6.8% and 5.6% error, respectively, even for completely different architectures from the ones applications ran on.
物联网的出现增加了计算平台的复杂性和异构性。在不同平台之间迁移工作负载是提高能源效率和性能的一种方法。有效的迁移决策需要对其成本和收益进行准确的估计。到目前为止,这些估计要么是通过检测源代码/二进制文件来完成的,这样会导致很高的开销,要么是通过使用硬件性能计数器的功率估计来完成的,这对于单个机器来说工作得很好,但到目前为止,对于跨不同体系结构的预测还不准确。在本文中,我们提出了P4,一个新的基于阶段的功率和性能预测框架,它在运行时识别异构计算系统的跨平台应用程序功率和性能。P4通过使用一组基准离线表征计算平台,分析和检测与机器无关的应用阶段,然后构建基于神经网络的模型,自动识别和概括每个基准阶段的复杂跨平台关系。然后,它利用这些模型以及在运行时收集的性能计数器测量来估计在完全不同的计算平台(包括不同的CPU架构)上运行时的性能和功耗,而不必在该平台上运行它。我们在四个商业异构平台上评估了提议的框架,从X86服务器到基于arm的移动架构,有129个行业标准基准。我们的实验结果表明,即使对于与应用程序运行的架构完全不同的架构,P4也可以分别以6.8%和5.6%的误差预测功耗和性能变化。
{"title":"P4: Phase-based power/performance prediction of heterogeneous systems via neural networks","authors":"Yeseong Kim, Pietro Mercati, A. More, Emily J. Shriver, T. Simunic","doi":"10.1109/ICCAD.2017.8203843","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203843","url":null,"abstract":"The emergence of Internet of Things increases the complexity and the heterogeneity of computing platforms. Migrating workload between various platforms is one way to improve both energy efficiency and performance. Effective migration decisions require accurate estimates of its costs and benefits. To date, these estimates were done by either instrumenting the source code/binaries, thus causing high overhead, or by using power estimates from hardware performance counters, which work well for individual machines, but until now have not been accurate for predicting across different architectures. In this paper, we propose P4, a new Phase-based Power and Performance Prediction framework which identifies cross-platform application power and performance at runtime for heterogeneous computing systems. P4 analyzes and detects machine-independent application phases by characterizing computing platforms offline with a set of benchmarks, and then builds neural network-based models to automatically identify and generalize the complex cross-platform relationships for each benchmark phase. It then leverages these models along with performance counter measurements collected at runtime to estimate performance and power consumption if it were running on a completely different computing platform, including a different CPU architecture, without ever having to run it on there. We evaluate the proposed framework on four commercial heterogeneous platforms, ranging from X86 servers to mobile ARM-based architecture, with 129 industry-standard benchmarks. Our experimental results show that P4 can predict the power and performance changes with only 6.8% and 5.6% error, respectively, even for completely different architectures from the ones applications ran on.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114511345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
期刊
2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1