Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203837
Meng Yang, J. Hayes, Deliang Fan, Weikang Qian
Stochastic computing (SC) is an unconventional computing paradigm that operates on stochastic bit streams. It has gained attention recently because of the very low area and power needs of its computing core. SC relies on stochastic number generators (SNGs) to map input binary numbers to stochastic bit streams. A conventional SNG comprises a random number source (RNS), typically an LFSR, and a comparator. It needs far more area and power than the SC core, offsetting the latter's main advantages. To mitigate this problem, SNGs employing emerging nanoscale devices such as memristors and spintronic devices have been proposed. However, these devices tend to have large errors in their output probabilities due to unpredictable variations in their fabrication processes and noise in their control signals. We present a novel method of exploiting such devices to design a highly accurate SNG. It is built around an RNS that generates uniformly distributed random numbers under ideal (nominal) conditions. It also has a novel error-cancelling probability conversion circuit (ECPCC) that guarantees very high accuracy in the output probability under realistic conditions when the RNS is subject to errors. An ECPCC can also be used to generate maximally correlated stochastic streams, a useful property for some applications.
{"title":"Design of accurate stochastic number generators with noisy emerging devices for stochastic computing","authors":"Meng Yang, J. Hayes, Deliang Fan, Weikang Qian","doi":"10.1109/ICCAD.2017.8203837","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203837","url":null,"abstract":"Stochastic computing (SC) is an unconventional computing paradigm that operates on stochastic bit streams. It has gained attention recently because of the very low area and power needs of its computing core. SC relies on stochastic number generators (SNGs) to map input binary numbers to stochastic bit streams. A conventional SNG comprises a random number source (RNS), typically an LFSR, and a comparator. It needs far more area and power than the SC core, offsetting the latter's main advantages. To mitigate this problem, SNGs employing emerging nanoscale devices such as memristors and spintronic devices have been proposed. However, these devices tend to have large errors in their output probabilities due to unpredictable variations in their fabrication processes and noise in their control signals. We present a novel method of exploiting such devices to design a highly accurate SNG. It is built around an RNS that generates uniformly distributed random numbers under ideal (nominal) conditions. It also has a novel error-cancelling probability conversion circuit (ECPCC) that guarantees very high accuracy in the output probability under realistic conditions when the RNS is subject to errors. An ECPCC can also be used to generate maximally correlated stochastic streams, a useful property for some applications.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126738908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203819
Chin-Hao Chang, Yao-Wen Chang, Tung-Chieh Chen
In this paper, we present a damped-wave constructive macro placement framework which packs big macros to optimize both wirelength and routability simultaneously. Unlike traditional V-shaped and Λ-shaped multilevel frameworks which might lack respective local and global information during processing, our dampedwave framework considers both local and global information by the following two major techniques: (1) macro clustering to improve scalability, and (2) constructive macros declustering to assist a standard-cell placer to obtain better solutions. We also present a macro-grouping cost model to remedy the key drawback of ignoring the mismatches of standard-cell locations between the prototyping and the final standard-cell placement stages in existing three-stage mixed-size placers (containing prototyping, macro placement, and standard cell placement). We further propose the regularity penalty model to guide macros to form an integral, regular region during macro placement, facilitating the succeeding placement of standard cell. Compared with manual placement from industrial and a leading mixed-size placer, experimental results show that our damped-wave multilevel framework and cost models are efficient and effective in reducing half-perimeter wirelength and routed wirelength and overflows. In particular, our work provides a new research direction on effective frameworks for large-scale designs, which readily apply to many optimization problems limited with scalability.
{"title":"A novel damped-wave framework for macro placement","authors":"Chin-Hao Chang, Yao-Wen Chang, Tung-Chieh Chen","doi":"10.1109/ICCAD.2017.8203819","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203819","url":null,"abstract":"In this paper, we present a damped-wave constructive macro placement framework which packs big macros to optimize both wirelength and routability simultaneously. Unlike traditional V-shaped and Λ-shaped multilevel frameworks which might lack respective local and global information during processing, our dampedwave framework considers both local and global information by the following two major techniques: (1) macro clustering to improve scalability, and (2) constructive macros declustering to assist a standard-cell placer to obtain better solutions. We also present a macro-grouping cost model to remedy the key drawback of ignoring the mismatches of standard-cell locations between the prototyping and the final standard-cell placement stages in existing three-stage mixed-size placers (containing prototyping, macro placement, and standard cell placement). We further propose the regularity penalty model to guide macros to form an integral, regular region during macro placement, facilitating the succeeding placement of standard cell. Compared with manual placement from industrial and a leading mixed-size placer, experimental results show that our damped-wave multilevel framework and cost models are efficient and effective in reducing half-perimeter wirelength and routed wirelength and overflows. In particular, our work provides a new research direction on effective frameworks for large-scale designs, which readily apply to many optimization problems limited with scalability.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114942092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203839
Jiangwei Zhang, Donald Kline, Liang Fang, R. Melhem, A. Jones
Emerging non-volatile memories have many advantages over conventional memory. Unfortunately, many are susceptible to write endurance challenges, resulting in stuck-at faults. Existing mitigation methods statically partition and invert data within a block containing such faults (partition-and-flip) to ensure data is written to match stuck-at cells such that they may remain in service. Unfortunately, these schemes have limited fault tolerance capabilities and require the assumption that their auxiliary bits are fault free. We propose a dynamic partitioning scheme that improves the number of tolerated stuck-at faults and simultaneously protects auxiliary bits. Dynamic partitioning can significantly improve the fault tolerance over existing static partitioning approaches with an equal number of auxiliary bits. Moreover, it can often still improve fault tolerance while reducing the number of auxiliary bits. Compared to flip-N-write and Aegis, a leading mitigation scheme, dynamic partitioning can achieve 7–72% and 5–53 x lower write error rates, respectively, for the same capacity overhead with a stuck-at-fault rate of 10−3.
{"title":"Dynamic partitioning to mitigate stuck-at faults in emerging memories","authors":"Jiangwei Zhang, Donald Kline, Liang Fang, R. Melhem, A. Jones","doi":"10.1109/ICCAD.2017.8203839","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203839","url":null,"abstract":"Emerging non-volatile memories have many advantages over conventional memory. Unfortunately, many are susceptible to write endurance challenges, resulting in stuck-at faults. Existing mitigation methods statically partition and invert data within a block containing such faults (partition-and-flip) to ensure data is written to match stuck-at cells such that they may remain in service. Unfortunately, these schemes have limited fault tolerance capabilities and require the assumption that their auxiliary bits are fault free. We propose a dynamic partitioning scheme that improves the number of tolerated stuck-at faults and simultaneously protects auxiliary bits. Dynamic partitioning can significantly improve the fault tolerance over existing static partitioning approaches with an equal number of auxiliary bits. Moreover, it can often still improve fault tolerance while reducing the number of auxiliary bits. Compared to flip-N-write and Aegis, a leading mitigation scheme, dynamic partitioning can achieve 7–72% and 5–53 x lower write error rates, respectively, for the same capacity overhead with a stuck-at-fault rate of 10−3.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116395078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203768
D. Wu, Pin-Ru Jhao, Charles H.-P. Wen
Functional timing analysis (FTA) emerges for better timing closure than static timing analysis (STA) by providing the true delay of the circuit as well as its input pattern. For Satisfiability(SAT)-based FTA, a search problem for circuit delay can be expressed by clauses corresponding to circuit consistency function (CCF) and timed characteristic function (TCF). In particular, the clause number tends to grow exponentially as the circuit size increases, lengthening runtime for FTA. However, when formulating TCF, numerous clauses and literals are found useless. Therefore, two key techniques are proposed: (1) Encoding Duplication Removal (EDR) for removing those literals that are previously encoded in CCF but now duplicated in TCF, and (2) Redundant State Propagation (RSP) for propagating redundant states of nodes to help prune TCF clauses. Experiments indicate that under the worst-case delay of each benchmark circuit, EDR and RSP successfully reduce averagely 49% of clauses, 65% of literals, and 52% runtime on seven benchmark circuits for FTA.
功能时序分析(FTA)通过提供电路的真实延迟及其输入模式,比静态时序分析(STA)具有更好的时序封闭性。对于基于可满足性(SAT)的自由贸易区,电路延迟的搜索问题可以用电路一致性函数(CCF)和时间特征函数(TCF)对应的子句来表示。特别是,随着电路尺寸的增加,条款数呈指数级增长,从而延长了FTA的运行时间。然而,在制定TCF时,发现许多子句和文字是无用的。因此,本文提出了两个关键技术:(1)编码重复去除(Encoding Duplication Removal, EDR),用于去除之前在CCF中编码但现在在TCF中重复的文字;(2)冗余状态传播(Redundant State Propagation, RSP),用于传播节点的冗余状态,以帮助修剪TCF子句。实验表明,在每个基准电路的最坏延迟情况下,EDR和RSP成功地在7个FTA基准电路上平均减少49%的子句,65%的字面量和52%的运行时间。
{"title":"Accelerating functional timing analysis with encoding duplication removal and redundant state propagation","authors":"D. Wu, Pin-Ru Jhao, Charles H.-P. Wen","doi":"10.1109/ICCAD.2017.8203768","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203768","url":null,"abstract":"Functional timing analysis (FTA) emerges for better timing closure than static timing analysis (STA) by providing the true delay of the circuit as well as its input pattern. For Satisfiability(SAT)-based FTA, a search problem for circuit delay can be expressed by clauses corresponding to circuit consistency function (CCF) and timed characteristic function (TCF). In particular, the clause number tends to grow exponentially as the circuit size increases, lengthening runtime for FTA. However, when formulating TCF, numerous clauses and literals are found useless. Therefore, two key techniques are proposed: (1) Encoding Duplication Removal (EDR) for removing those literals that are previously encoded in CCF but now duplicated in TCF, and (2) Redundant State Propagation (RSP) for propagating redundant states of nodes to help prune TCF clauses. Experiments indicate that under the worst-case delay of each benchmark circuit, EDR and RSP successfully reduce averagely 49% of clauses, 65% of literals, and 52% runtime on seven benchmark circuits for FTA.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128520883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203840
S. Chatterjee, V. Sukharev, F. Najm
Electromigration (EM) is a key reliability concern in chip power/ ground (p/g) grids, which has been exacerbated by the high current levels and narrow metal lines in modern grids. EM checking is expensive due to the large sizes of modern p/g grids and is also inherently difficult due to the complex nature of the EM phenomenon. Traditional EM checking, based on empirical models, cannot capture the complexity of EM and better models are needed for accurate prediction. Thus, recent physics-based EM models have been proposed, which remain computationally expensive because they require solution of a system of partial differential equations (PDEs). In this paper, we propose a fast and scalable methodology for power grid EM verification, building on previous physics-based models. We first convert the PDE system to a succession of homogeneous linear time invariant (LTI) systems. Because these systems are found to be stiff, we numerically integrate them using optimized variable-step backward differentiation formulas (BDFs). Our method, for a number of IBM power grids and internal benchmarks, achieves an average speed-up of over 20x as compared to previously published work and has a runtime of only about 8 minutes for a 4 million node grid.
{"title":"Fast physics-based electromigration assessment by efficient solution of linear time-invariant (LTI) systems","authors":"S. Chatterjee, V. Sukharev, F. Najm","doi":"10.1109/ICCAD.2017.8203840","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203840","url":null,"abstract":"Electromigration (EM) is a key reliability concern in chip power/ ground (p/g) grids, which has been exacerbated by the high current levels and narrow metal lines in modern grids. EM checking is expensive due to the large sizes of modern p/g grids and is also inherently difficult due to the complex nature of the EM phenomenon. Traditional EM checking, based on empirical models, cannot capture the complexity of EM and better models are needed for accurate prediction. Thus, recent physics-based EM models have been proposed, which remain computationally expensive because they require solution of a system of partial differential equations (PDEs). In this paper, we propose a fast and scalable methodology for power grid EM verification, building on previous physics-based models. We first convert the PDE system to a succession of homogeneous linear time invariant (LTI) systems. Because these systems are found to be stiff, we numerically integrate them using optimized variable-step backward differentiation formulas (BDFs). Our method, for a number of IBM power grids and internal benchmarks, achieves an average speed-up of over 20x as compared to previously published work and has a runtime of only about 8 minutes for a 4 million node grid.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130513514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203774
Xinyuan Wang, Hao Zhuang, Chung-Kuan Cheng
We explore Krylov subspace algorithms to calculate ϕ functions of exponential integrators for circuit simulation. Higham [1] pointed out the potential numerical stability risk of ϕ functions computation. However, for the applications to circuit analysis, the choice of methods remains open. This work inspects the accuracy of matrix exponential and vector product with Krylov subspace methods, and identifies the proper approach to achieving numerically stable solutions for nonlinear circuits. Empirial results verify the quality of the proposed methods using various orders of ϕ functions. Furthermore, instead of Newton-Raphson (NR) iterations in conventional methods, an iterative residue correction algorithm is devised for nonlinear system analysis. The stability and efficiency of our methods are illustrated with experiments.
{"title":"Exploring the exponential integrators with Krylov subspace algorithms for nonlinear circuit simulation","authors":"Xinyuan Wang, Hao Zhuang, Chung-Kuan Cheng","doi":"10.1109/ICCAD.2017.8203774","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203774","url":null,"abstract":"We explore Krylov subspace algorithms to calculate ϕ functions of exponential integrators for circuit simulation. Higham [1] pointed out the potential numerical stability risk of ϕ functions computation. However, for the applications to circuit analysis, the choice of methods remains open. This work inspects the accuracy of matrix exponential and vector product with Krylov subspace methods, and identifies the proper approach to achieving numerically stable solutions for nonlinear circuits. Empirial results verify the quality of the proposed methods using various orders of ϕ functions. Furthermore, instead of Newton-Raphson (NR) iterations in conventional methods, an iterative residue correction algorithm is devised for nonlinear system analysis. The stability and efficiency of our methods are illustrated with experiments.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127001442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203838
Tengtao Li, S. Sapatnekar
3D-stacked wide I/O DRAM can significantly increase cell density and bandwidth while also lowering power consumption. However, 3D structures experience significant thermomechanical stress, which impacts circuit performance. This paper develops a procedure that performs a full performance analysis of 3D DRAMs, including latency, leakage power, refresh power, and area, while incorporating the effects of both layout-aware stress and layout-independent stress. The approach first proposes an analytic stress analysis method for the entire 3D DRAM structure, capturing the stress induced by TSVs, micro bumps, package bumps and warpage. Next, this stress is translated to variations in device mobility and threshold voltage, after which analytical models for latency, leakage power, and refresh power are derived. Finally, a complete analysis of performance variations is performed for various 3D DRAM layout configurations to assess the impact of layout-dependent stress.
{"title":"Stress-aware performance evaluation of 3D-stacked wide I/O DRAMs","authors":"Tengtao Li, S. Sapatnekar","doi":"10.1109/ICCAD.2017.8203838","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203838","url":null,"abstract":"3D-stacked wide I/O DRAM can significantly increase cell density and bandwidth while also lowering power consumption. However, 3D structures experience significant thermomechanical stress, which impacts circuit performance. This paper develops a procedure that performs a full performance analysis of 3D DRAMs, including latency, leakage power, refresh power, and area, while incorporating the effects of both layout-aware stress and layout-independent stress. The approach first proposes an analytic stress analysis method for the entire 3D DRAM structure, capturing the stress induced by TSVs, micro bumps, package bumps and warpage. Next, this stress is translated to variations in device mobility and threshold voltage, after which analytical models for latency, leakage power, and refresh power are derived. Finally, a complete analysis of performance variations is performed for various 3D DRAM layout configurations to assess the impact of layout-dependent stress.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126265224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203775
Xiaoyi Wang, Yan Yan, Jian He, S. Tan, Chase Cook, Shengqi Yang
Electromigration (EM) becomes one of the most challenging reliability issues for current and future ICs in 10nm technology and below. In this paper, we propose a new analsys method for the EM hydrostatic stress evolution for multi-branch interconnect trees, which is the foundation of the EM reliability assessment for large scale on-chip interconnect networks, such as power grid networks. The proposed method, which is based on eigenfunctions technique, could efficiently calculate the hydrostatic stress evolution for multi-branch interconnect trees stressed with different current densities and non-uniformly distributed thermal effects. The new method can also accommodate the pre-existing residual stresses coming from thermal or other stress sources. The proposed method solves the partial differential equations of EM stress more efficiently since it does not require any discretization either spatially or temporall, which is in contrast to numerical methods such as finite difference method and finite element method. The accuracy of the proposed transient analysis approach is validated against the analytical solution and commercial tools. The efficiency of the proposed method is demonstrated and compared to finite difference method. The proposed method is 10X∼100X times faster than finite difference method and scales better for larger interconnect trees.
{"title":"Fast physics-based electromigration analysis for multi-branch interconnect trees","authors":"Xiaoyi Wang, Yan Yan, Jian He, S. Tan, Chase Cook, Shengqi Yang","doi":"10.1109/ICCAD.2017.8203775","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203775","url":null,"abstract":"Electromigration (EM) becomes one of the most challenging reliability issues for current and future ICs in 10nm technology and below. In this paper, we propose a new analsys method for the EM hydrostatic stress evolution for multi-branch interconnect trees, which is the foundation of the EM reliability assessment for large scale on-chip interconnect networks, such as power grid networks. The proposed method, which is based on eigenfunctions technique, could efficiently calculate the hydrostatic stress evolution for multi-branch interconnect trees stressed with different current densities and non-uniformly distributed thermal effects. The new method can also accommodate the pre-existing residual stresses coming from thermal or other stress sources. The proposed method solves the partial differential equations of EM stress more efficiently since it does not require any discretization either spatially or temporall, which is in contrast to numerical methods such as finite difference method and finite element method. The accuracy of the proposed transient analysis approach is validated against the analytical solution and commercial tools. The efficiency of the proposed method is demonstrated and compared to finite difference method. The proposed method is 10X∼100X times faster than finite difference method and scales better for larger interconnect trees.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115882525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203863
Kang Yao, Yaoyao Ye, S. Pasricha, Jiang Xu
In order to overcome limitations of traditional electronic interconnects in terms of power efficiency and bandwidth density, optical networks-on-chip (NoCs) based on 3D integrated silicon photonics have been proposed as an emerging on-chip communication architecture for multiprocessor systems-on-chip (MPSoCs) with large core counts. However, due to thermo-optic effects, wavelength-selective silicon photonic devices such as microresonators, which are widely used in optical NoCs, suffer from temperature-dependent wavelength shifts. As a result, on-chip temperature variations cause significant thermal-induced optical power loss which may counteract the power advantages of optical NoCs. To tackle this problem, in this work, we present a thermal-sensitive design and power optimization approach for a 3D torus-based optical NoC architecture. Based on an optical thermal modeling platform which models the thermal effect in optical NoCs from a system-level perspective, a thermal-sensitive routing algorithm is proposed for the 3D torus-based optical NoC to optimize its power consumption in the presence of on-chip temperature variations. Simulation results show that in an 8×8×2 3D torus-based optical NoC under a set of real applications, as compared with a matched 3D mesh-based optical NoC with traditional dimension order routing, the power consumption is reduced by 25% if thermal tuning for microresonators is not utilized, by 19% if thermal tuning is utilized for microresonators, and by 17% if athermal microresonators are used.
{"title":"Thermal-sensitive design and power optimization for a 3D torus-based optical NoC","authors":"Kang Yao, Yaoyao Ye, S. Pasricha, Jiang Xu","doi":"10.1109/ICCAD.2017.8203863","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203863","url":null,"abstract":"In order to overcome limitations of traditional electronic interconnects in terms of power efficiency and bandwidth density, optical networks-on-chip (NoCs) based on 3D integrated silicon photonics have been proposed as an emerging on-chip communication architecture for multiprocessor systems-on-chip (MPSoCs) with large core counts. However, due to thermo-optic effects, wavelength-selective silicon photonic devices such as microresonators, which are widely used in optical NoCs, suffer from temperature-dependent wavelength shifts. As a result, on-chip temperature variations cause significant thermal-induced optical power loss which may counteract the power advantages of optical NoCs. To tackle this problem, in this work, we present a thermal-sensitive design and power optimization approach for a 3D torus-based optical NoC architecture. Based on an optical thermal modeling platform which models the thermal effect in optical NoCs from a system-level perspective, a thermal-sensitive routing algorithm is proposed for the 3D torus-based optical NoC to optimize its power consumption in the presence of on-chip temperature variations. Simulation results show that in an 8×8×2 3D torus-based optical NoC under a set of real applications, as compared with a matched 3D mesh-based optical NoC with traditional dimension order routing, the power consumption is reduced by 25% if thermal tuning for microresonators is not utilized, by 19% if thermal tuning is utilized for microresonators, and by 17% if athermal microresonators are used.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131343451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203843
Yeseong Kim, Pietro Mercati, A. More, Emily J. Shriver, T. Simunic
The emergence of Internet of Things increases the complexity and the heterogeneity of computing platforms. Migrating workload between various platforms is one way to improve both energy efficiency and performance. Effective migration decisions require accurate estimates of its costs and benefits. To date, these estimates were done by either instrumenting the source code/binaries, thus causing high overhead, or by using power estimates from hardware performance counters, which work well for individual machines, but until now have not been accurate for predicting across different architectures. In this paper, we propose P4, a new Phase-based Power and Performance Prediction framework which identifies cross-platform application power and performance at runtime for heterogeneous computing systems. P4 analyzes and detects machine-independent application phases by characterizing computing platforms offline with a set of benchmarks, and then builds neural network-based models to automatically identify and generalize the complex cross-platform relationships for each benchmark phase. It then leverages these models along with performance counter measurements collected at runtime to estimate performance and power consumption if it were running on a completely different computing platform, including a different CPU architecture, without ever having to run it on there. We evaluate the proposed framework on four commercial heterogeneous platforms, ranging from X86 servers to mobile ARM-based architecture, with 129 industry-standard benchmarks. Our experimental results show that P4 can predict the power and performance changes with only 6.8% and 5.6% error, respectively, even for completely different architectures from the ones applications ran on.
{"title":"P4: Phase-based power/performance prediction of heterogeneous systems via neural networks","authors":"Yeseong Kim, Pietro Mercati, A. More, Emily J. Shriver, T. Simunic","doi":"10.1109/ICCAD.2017.8203843","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203843","url":null,"abstract":"The emergence of Internet of Things increases the complexity and the heterogeneity of computing platforms. Migrating workload between various platforms is one way to improve both energy efficiency and performance. Effective migration decisions require accurate estimates of its costs and benefits. To date, these estimates were done by either instrumenting the source code/binaries, thus causing high overhead, or by using power estimates from hardware performance counters, which work well for individual machines, but until now have not been accurate for predicting across different architectures. In this paper, we propose P4, a new Phase-based Power and Performance Prediction framework which identifies cross-platform application power and performance at runtime for heterogeneous computing systems. P4 analyzes and detects machine-independent application phases by characterizing computing platforms offline with a set of benchmarks, and then builds neural network-based models to automatically identify and generalize the complex cross-platform relationships for each benchmark phase. It then leverages these models along with performance counter measurements collected at runtime to estimate performance and power consumption if it were running on a completely different computing platform, including a different CPU architecture, without ever having to run it on there. We evaluate the proposed framework on four commercial heterogeneous platforms, ranging from X86 servers to mobile ARM-based architecture, with 129 industry-standard benchmarks. Our experimental results show that P4 can predict the power and performance changes with only 6.8% and 5.6% error, respectively, even for completely different architectures from the ones applications ran on.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114511345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}