首页 > 最新文献

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)最新文献

英文 中文
Cross-program design space exploration by ensemble transfer learning 集成迁移学习的跨程序设计空间探索
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203779
Dandan Li, Shuzhen Yao, Senzhang Wang, Y. Wang
Due to the increasing complexity of the processor architecture and the time-consuming software simulation, efficient design space exploration (DSE) has become a critical challenge in processor design. To address this challenge, recently machine learning techniques have been widely explored for predicting the performance of various configurations through conducting only a small number of simulations as the training samples. However, most existing methods randomly select some samples for simulation from the entire configuration space as training samples to build program-specific predictors. When a new program is considered, a large number of new program-specific simulations are needed for building a new predictor. Thus considerable simulation cost is required for each program. In this paper, we propose an efficient cross-program DSE framework TrEE by combining a flexible statistical sampling strategy and ensemble transfer learning technique. Specifically, TrEE includes the following two phases which also form our major contributions: 1) proposing an orthogonal array based foldover design for flexibly sampling the representative configurations for simulation, and 2) proposing an ensemble transfer learning algorithm that can effectively transfer knowledge among different types of programs for improving the prediction performance for the new program. We evaluate the proposed TrEE on the benchmarks of SPEC CPU 2006 suite. The results demonstrate that TrEE is much more efficient and robust than state-of-art DSE techniques.
由于处理器结构的复杂性和软件仿真的耗时,高效的设计空间探索(DSE)已成为处理器设计中的一个关键挑战。为了应对这一挑战,最近机器学习技术已被广泛探索,通过仅进行少量模拟作为训练样本来预测各种配置的性能。然而,大多数现有的方法是从整个组态空间中随机选择一些样本进行模拟,作为训练样本来构建特定于程序的预测器。当考虑一个新程序时,需要大量的新程序特定的模拟来构建一个新的预测器。因此,每个程序都需要相当大的仿真成本。本文结合灵活的统计抽样策略和集成迁移学习技术,提出了一种高效的跨程序DSE框架树。具体来说,TrEE包括以下两个阶段,这也是我们的主要贡献:1)提出了一种基于正交阵列的折叠设计,可以灵活地采样模拟的代表性配置;2)提出了一种集成迁移学习算法,可以有效地在不同类型的程序之间转移知识,以提高新程序的预测性能。我们在SPEC CPU 2006套件的基准测试上评估了提议的树。结果表明,TrEE比最先进的DSE技术更有效和健壮。
{"title":"Cross-program design space exploration by ensemble transfer learning","authors":"Dandan Li, Shuzhen Yao, Senzhang Wang, Y. Wang","doi":"10.1109/ICCAD.2017.8203779","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203779","url":null,"abstract":"Due to the increasing complexity of the processor architecture and the time-consuming software simulation, efficient design space exploration (DSE) has become a critical challenge in processor design. To address this challenge, recently machine learning techniques have been widely explored for predicting the performance of various configurations through conducting only a small number of simulations as the training samples. However, most existing methods randomly select some samples for simulation from the entire configuration space as training samples to build program-specific predictors. When a new program is considered, a large number of new program-specific simulations are needed for building a new predictor. Thus considerable simulation cost is required for each program. In this paper, we propose an efficient cross-program DSE framework TrEE by combining a flexible statistical sampling strategy and ensemble transfer learning technique. Specifically, TrEE includes the following two phases which also form our major contributions: 1) proposing an orthogonal array based foldover design for flexibly sampling the representative configurations for simulation, and 2) proposing an ensemble transfer learning algorithm that can effectively transfer knowledge among different types of programs for improving the prediction performance for the new program. We evaluate the proposed TrEE on the benchmarks of SPEC CPU 2006 suite. The results demonstrate that TrEE is much more efficient and robust than state-of-art DSE techniques.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117023510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
DAGSENS: Directed acyclic graph based direct and adjoint transient sensitivity analysis for event-driven objective functions DAGSENS:基于有向无环图的事件驱动目标函数的直接和伴随瞬态灵敏度分析
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203773
K. Aadithya, E. Keiter, Ting Mei
We present DAGSENS, a new approach to parametric transient sensitivity analysis of Differential Algebraic Equation systems (DAEs), such as SPICE-level circuits. The key ideas behind DAGSENS are, (1) to represent the entire sequence of computations from DAE parameters to the objective function (whose sensitivity is needed) as a Directed Acyclic Graph (DAG) called the “sensitivity DAG”, and (2) to compute the required sensitivites efficiently by using dynamic programming techniques to traverse the DAG. DAGSENS is simple, elegant, and easy-to-understand compared to previous approaches; for example, in DAGSENS, one can switch between direct and adjoint sensitivities simply by reversing the direction of DAG traversal. Also, DAGSENS is more powerful than previous approaches because it works for a more general class of objective functions, including those based on “events” that occur during a transient simulation (e.g., a node voltage crossing a threshold, a phase-locked loop (PLL) achieving lock, a circuit signal reaching its maximum/minimum value, etc.). In this paper, we demonstrate DAGSENS on several electronic and biological applications, including high-speed communication, statistical cell library characterization, and gene expression.
我们提出了DAGSENS,一种新的方法来分析微分代数方程系统(DAEs)的参数瞬态灵敏度,如spice级电路。DAGSENS背后的关键思想是:(1)将DAE参数到目标函数(其灵敏度需要)的整个计算序列表示为称为“灵敏度DAG”的有向无环图(DAG),以及(2)通过使用动态规划技术遍历DAG来有效地计算所需的灵敏度。与以前的方法相比,DAGSENS简单,优雅,易于理解;例如,在DAGSENS中,人们可以通过简单地反转DAG遍历的方向来在直接灵敏度和伴随灵敏度之间切换。此外,DAGSENS比以前的方法更强大,因为它适用于更一般的目标函数,包括那些基于瞬态仿真期间发生的“事件”的函数(例如,节点电压越过阈值,锁相环(PLL)实现锁定,电路信号达到最大值/最小值等)。在本文中,我们展示了DAGSENS在几个电子和生物应用,包括高速通信,统计细胞文库表征和基因表达。
{"title":"DAGSENS: Directed acyclic graph based direct and adjoint transient sensitivity analysis for event-driven objective functions","authors":"K. Aadithya, E. Keiter, Ting Mei","doi":"10.1109/ICCAD.2017.8203773","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203773","url":null,"abstract":"We present DAGSENS, a new approach to parametric transient sensitivity analysis of Differential Algebraic Equation systems (DAEs), such as SPICE-level circuits. The key ideas behind DAGSENS are, (1) to represent the entire sequence of computations from DAE parameters to the objective function (whose sensitivity is needed) as a Directed Acyclic Graph (DAG) called the “sensitivity DAG”, and (2) to compute the required sensitivites efficiently by using dynamic programming techniques to traverse the DAG. DAGSENS is simple, elegant, and easy-to-understand compared to previous approaches; for example, in DAGSENS, one can switch between direct and adjoint sensitivities simply by reversing the direction of DAG traversal. Also, DAGSENS is more powerful than previous approaches because it works for a more general class of objective functions, including those based on “events” that occur during a transient simulation (e.g., a node voltage crossing a threshold, a phase-locked loop (PLL) achieving lock, a circuit signal reaching its maximum/minimum value, etc.). In this paper, we demonstrate DAGSENS on several electronic and biological applications, including high-speed communication, statistical cell library characterization, and gene expression.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129470125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
FPGA placement and routing FPGA放置和路由
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203878
Shih-Chun Chen, Yao-Wen Chang
FPGAs have emerged as a popular style for modern circuit designs, due mainly to their non-recurring costs, in-field reprogrammability, short turn-around time, etc. A modern FPGA consists of an array of heterogeneous logic components, surrounded by routing resources and bounded by I/O cells. Compared to an ASIC, an FPGA has more limited logic and routing resources, diverse architectures, strict design constraints, etc.; as a result, FPGA placement and routing problems become much more challenging. With growing complexity, diverse design objectives, high heterogeneity, and evolving technologies, further, modern FPGA placement and routing bring up many emerging research opportunities. In this paper, we introduce basic architectures of FPGAs, describe the placement and routing problems for FPGAs, and explain key techniques to solve the problems (including three major placement paradigms: partitioning, simulated annealing, and analytical placement; two routing paradigms: sequential and concurrent routing, and simultaneous placement and routing). Finally, we provide some future research directions for FPGA placement and routing.
fpga已经成为现代电路设计的流行风格,主要是由于它们的非经常性成本,现场可重新编程性,短的周转时间等。现代FPGA由一组异构逻辑组件组成,由路由资源包围,并以I/O单元为界。与ASIC相比,FPGA具有更有限的逻辑和路由资源、多样化的架构、严格的设计约束等;因此,FPGA的放置和路由问题变得更具挑战性。随着复杂性的增加、设计目标的多样化、高异构性和技术的发展,现代FPGA的放置和路由带来了许多新兴的研究机会。在本文中,我们介绍了fpga的基本架构,描述了fpga的布局和路由问题,并解释了解决问题的关键技术(包括三种主要的布局范式:划分,模拟退火和分析布局;两种路由范例:顺序和并发路由,以及同时放置和路由)。最后,对FPGA的布局和路由提出了未来的研究方向。
{"title":"FPGA placement and routing","authors":"Shih-Chun Chen, Yao-Wen Chang","doi":"10.1109/ICCAD.2017.8203878","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203878","url":null,"abstract":"FPGAs have emerged as a popular style for modern circuit designs, due mainly to their non-recurring costs, in-field reprogrammability, short turn-around time, etc. A modern FPGA consists of an array of heterogeneous logic components, surrounded by routing resources and bounded by I/O cells. Compared to an ASIC, an FPGA has more limited logic and routing resources, diverse architectures, strict design constraints, etc.; as a result, FPGA placement and routing problems become much more challenging. With growing complexity, diverse design objectives, high heterogeneity, and evolving technologies, further, modern FPGA placement and routing bring up many emerging research opportunities. In this paper, we introduce basic architectures of FPGAs, describe the placement and routing problems for FPGAs, and explain key techniques to solve the problems (including three major placement paradigms: partitioning, simulated annealing, and analytical placement; two routing paradigms: sequential and concurrent routing, and simultaneous placement and routing). Finally, we provide some future research directions for FPGA placement and routing.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"9 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129794885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
A load balancing inspired optimization framework for exascale multicore systems: A complex networks approach e级多核系统的负载平衡优化框架:一种复杂的网络方法
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203781
Yao Xiao, Yuankun Xue, Shahin Nazarian, P. Bogdan
Many-core multi-threaded performance is plagued by on-chip communication nonidealities, limited memory bandwidth, and critical sections. Inspired by complex network theory of social communities, we propose a novel methodology to model the dynamic execution of an application and partition the application into an optimal number of clusters for parallel execution. We first adopt an LLVM IR compiler analysis of a specific application and construct a dynamic application dependency graph encoding its computational and memory operations. Next, based on this graph, we propose an optimization model to find the optimal clusters such that (1) the intra-cluster edges are maximized, (2) the execution times of the clusters are nearly equalized, for load balancing, and (3) the cluster size does not exceed the core count. Our novel approach confines data movement to be mainly inside a cluster for power reduction and congestion prevention. Finally, we propose an algorithm to sort the graph of connected clusters topologically and map the clusters onto NoC. Experimental results on a 32-core NoC demonstrate a maximum speedup of 131.82% when compared to thread-based execution. Furthermore, the scalability of our framework makes it a promising software design automation platform.
多核多线程性能受到片上通信非理想性、有限内存带宽和临界区的影响。受社会社区复杂网络理论的启发,我们提出了一种新的方法来模拟应用程序的动态执行,并将应用程序划分为最优数量的集群以并行执行。我们首先采用LLVM IR编译器对特定应用程序进行分析,并构建动态应用依赖图,对其计算和内存操作进行编码。接下来,在此图的基础上,我们提出了一个优化模型来寻找最优集群,这样(1)集群内边缘最大化,(2)集群的执行时间几乎相等,以实现负载平衡,以及(3)集群大小不超过核心计数。我们的新方法将数据移动主要限制在集群内,以降低功耗和防止拥塞。最后,我们提出了一种对连通簇图进行拓扑排序并映射到NoC上的算法。在32核NoC上的实验结果表明,与基于线程的执行相比,最大速度提高了131.82%。此外,该框架的可扩展性使其成为一个有前途的软件设计自动化平台。
{"title":"A load balancing inspired optimization framework for exascale multicore systems: A complex networks approach","authors":"Yao Xiao, Yuankun Xue, Shahin Nazarian, P. Bogdan","doi":"10.1109/ICCAD.2017.8203781","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203781","url":null,"abstract":"Many-core multi-threaded performance is plagued by on-chip communication nonidealities, limited memory bandwidth, and critical sections. Inspired by complex network theory of social communities, we propose a novel methodology to model the dynamic execution of an application and partition the application into an optimal number of clusters for parallel execution. We first adopt an LLVM IR compiler analysis of a specific application and construct a dynamic application dependency graph encoding its computational and memory operations. Next, based on this graph, we propose an optimization model to find the optimal clusters such that (1) the intra-cluster edges are maximized, (2) the execution times of the clusters are nearly equalized, for load balancing, and (3) the cluster size does not exceed the core count. Our novel approach confines data movement to be mainly inside a cluster for power reduction and congestion prevention. Finally, we propose an algorithm to sort the graph of connected clusters topologically and map the clusters onto NoC. Experimental results on a 32-core NoC demonstrate a maximum speedup of 131.82% when compared to thread-based execution. Furthermore, the scalability of our framework makes it a promising software design automation platform.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130191844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Efficient simulation of EM side-channel attack resilience 电磁侧信道攻击弹性的高效仿真
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203769
Amit Kumar, C. Scarborough, Ali E. Yılmaz, M. Orshansky
Electromagnetic (EM) fields emanated during crypto-operations are an effective non-invasive channel for extracting secret keys. To predict vulnerabilities and improve resilience to EM side-channel analysis attacks, design-time simulation tools are needed. Predictive simulation of such attacks is computationally taxing, however, as it requires transient circuit and EM simulation for a large number of encryptions, with high modeling accuracy, and high spatial and temporal resolution of EM fields. We developed a computational platform for EM side-channel attack analysis using commercial EDA tools to extract current waveforms and a custom EM simulator to radiate them. We achieve a 7000X speed-up over brute-force sequential simulation by identifying information-leaking cycles, deploying hybrid gate-and transistor-level simulation, radiating only EM-dominant currents, and simulating different encryptions in parallel. This permits a vulnerability study of a 32nm design of Advanced Encryption System block cipher to differential attacks with manageable 20h/attack cost. We demonstrate that EM attacks can succeed with 6X fewer encryptions compared to power attacks and identify worst information-leaking hotspots. The proposed platform enables targeted deployment of design-level countermeasures, leading us to identify a power/ground network design with a 4X security boost over an alternative.
加密操作过程中产生的电磁场是一种有效的非侵入式密钥提取通道。为了预测漏洞并提高对EM侧信道分析攻击的弹性,需要设计时仿真工具。然而,这种攻击的预测模拟在计算上是很费力的,因为它需要对大量加密进行瞬态电路和电磁模拟,具有很高的建模精度,以及电磁场的高时空分辨率。我们开发了一个计算平台,用于EM侧通道攻击分析,使用商业EDA工具提取电流波形和定制的EM模拟器来辐射它们。我们通过识别信息泄漏周期,部署混合栅极和晶体管级仿真,仅辐射em主导电流,并并行模拟不同的加密,实现了超过蛮力顺序仿真的7000X加速。这允许对32nm高级加密系统分组密码设计进行漏洞研究,以可管理的20小时/次攻击成本应对不同攻击。我们证明,与电源攻击相比,EM攻击可以在加密次数减少6倍的情况下成功,并识别出最严重的信息泄露热点。提出的平台能够有针对性地部署设计级对策,使我们能够确定具有4倍安全性的电源/地网络设计。
{"title":"Efficient simulation of EM side-channel attack resilience","authors":"Amit Kumar, C. Scarborough, Ali E. Yılmaz, M. Orshansky","doi":"10.1109/ICCAD.2017.8203769","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203769","url":null,"abstract":"Electromagnetic (EM) fields emanated during crypto-operations are an effective non-invasive channel for extracting secret keys. To predict vulnerabilities and improve resilience to EM side-channel analysis attacks, design-time simulation tools are needed. Predictive simulation of such attacks is computationally taxing, however, as it requires transient circuit and EM simulation for a large number of encryptions, with high modeling accuracy, and high spatial and temporal resolution of EM fields. We developed a computational platform for EM side-channel attack analysis using commercial EDA tools to extract current waveforms and a custom EM simulator to radiate them. We achieve a 7000X speed-up over brute-force sequential simulation by identifying information-leaking cycles, deploying hybrid gate-and transistor-level simulation, radiating only EM-dominant currents, and simulating different encryptions in parallel. This permits a vulnerability study of a 32nm design of Advanced Encryption System block cipher to differential attacks with manageable 20h/attack cost. We demonstrate that EM attacks can succeed with 6X fewer encryptions compared to power attacks and identify worst information-leaking hotspots. The proposed platform enables targeted deployment of design-level countermeasures, leading us to identify a power/ground network design with a 4X security boost over an alternative.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122038776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Learn-on-the-go: Autonomous cross-subject context learning for internet-of-things applications learning -on- on- go:物联网应用的自主跨学科上下文学习
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203800
Ramin Fallahzadeh, Parastoo Alinia, Hassan Ghasemzadeh
Developing machine learning algorithms for applications of Internet-of-Things requires collecting a large amount of labeled training data, which is an expensive and labor-intensive process. Upon a minor change in the context, for example utilization by a new user, the model will need re-training to maintain the initial performance. To address this problem, we propose a graph model and an unsupervised label transfer algorithm (learn-on-the-go) which exploits the relations between source and target user data to develop a highly-accurate and scalable machine learning model. Our analysis on real-world data demonstrates 54% and 22% performance improvement against baseline and state-of-the-art solutions, respectively.
开发用于物联网应用的机器学习算法需要收集大量标记训练数据,这是一个昂贵且劳动密集型的过程。在上下文发生微小变化时,例如新用户的使用,模型将需要重新训练以保持初始性能。为了解决这个问题,我们提出了一个图模型和一个无监督标签传输算法(learning -on-the-go),该算法利用源和目标用户数据之间的关系来开发一个高度精确和可扩展的机器学习模型。我们对实际数据的分析表明,与基线和最先进的解决方案相比,性能分别提高了54%和22%。
{"title":"Learn-on-the-go: Autonomous cross-subject context learning for internet-of-things applications","authors":"Ramin Fallahzadeh, Parastoo Alinia, Hassan Ghasemzadeh","doi":"10.1109/ICCAD.2017.8203800","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203800","url":null,"abstract":"Developing machine learning algorithms for applications of Internet-of-Things requires collecting a large amount of labeled training data, which is an expensive and labor-intensive process. Upon a minor change in the context, for example utilization by a new user, the model will need re-training to maintain the initial performance. To address this problem, we propose a graph model and an unsupervised label transfer algorithm (learn-on-the-go) which exploits the relations between source and target user data to develop a highly-accurate and scalable machine learning model. Our analysis on real-world data demonstrates 54% and 22% performance improvement against baseline and state-of-the-art solutions, respectively.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127538238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A closed-loop design to enhance weight stability of memristor based neural network chips 一种提高忆阻器神经网络芯片重量稳定性的闭环设计
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203824
Bonan Yan, J. Yang, Qing Wu, Yiran Chen, Hai Helen Li
Compared with the algorithm optimizations, brain-inspired neural network chips aim to fundamentally change the computer architecture and therefore enhance the computation capability and performance in advanced data processing. In recent years, memristor technology has been investigated in developing high-speed and large-capacity neural network chips. However, it has been observed that memristance values that represent the well-trained network weights can be disturbed by electrical or thermal perturbations. It severely degrades overall system reliability and emerges as a major design challenge. In this work, we systematically analyze the impacts of low-voltage induced memristance drift upon weight disturbance after times of recall operations. A closed-loop design by introducing a real-time feedback controller is proposed to enhance the weight stability of memristor based neural network chips. By mimicking the training process, the controller adaptively compensates the memristance deviation, according to the relation of the input data and recall output. In view of tiny disturbance per access, we integrate the memristance compensation into regular recall operation to avoid the execution speed degradation. Our simulations based on the implementation of representative single-layer (two-layer) network show that the proposed closed-loop design can prolong the service time of memristor-based neural network chip by 14.85x (14.94x), without reducing computational speed. Extra circuitry of the feedback controller induces a negligible overhead about 1.16% on overall power consumption.
与算法优化相比,脑启发神经网络芯片旨在从根本上改变计算机体系结构,从而提高高级数据处理的计算能力和性能。近年来,忆阻器技术在高速大容量神经网络芯片的开发中得到了广泛的应用。然而,已经观察到,表示训练良好的网络权重的忆阻值可能受到电或热扰动的干扰。它严重降低了整个系统的可靠性,并成为主要的设计挑战。在这项工作中,我们系统地分析了低压诱导记忆电阻漂移对回忆操作次数后体重干扰的影响。为了提高忆阻器神经网络芯片的权重稳定性,提出了一种引入实时反馈控制器的闭环设计。控制器通过模拟训练过程,根据输入数据与召回输出的关系,自适应补偿记忆电阻偏差。考虑到每次访问的干扰很小,我们将忆阻补偿集成到常规的召回操作中,以避免执行速度下降。通过对具有代表性的单层(双层)网络的仿真,我们发现所提出的闭环设计可以在不降低计算速度的情况下,将基于忆阻器的神经网络芯片的服务时间延长14.85倍(14.94倍)。反馈控制器的额外电路在总功耗上的开销可忽略不计,约为1.16%。
{"title":"A closed-loop design to enhance weight stability of memristor based neural network chips","authors":"Bonan Yan, J. Yang, Qing Wu, Yiran Chen, Hai Helen Li","doi":"10.1109/ICCAD.2017.8203824","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203824","url":null,"abstract":"Compared with the algorithm optimizations, brain-inspired neural network chips aim to fundamentally change the computer architecture and therefore enhance the computation capability and performance in advanced data processing. In recent years, memristor technology has been investigated in developing high-speed and large-capacity neural network chips. However, it has been observed that memristance values that represent the well-trained network weights can be disturbed by electrical or thermal perturbations. It severely degrades overall system reliability and emerges as a major design challenge. In this work, we systematically analyze the impacts of low-voltage induced memristance drift upon weight disturbance after times of recall operations. A closed-loop design by introducing a real-time feedback controller is proposed to enhance the weight stability of memristor based neural network chips. By mimicking the training process, the controller adaptively compensates the memristance deviation, according to the relation of the input data and recall output. In view of tiny disturbance per access, we integrate the memristance compensation into regular recall operation to avoid the execution speed degradation. Our simulations based on the implementation of representative single-layer (two-layer) network show that the proposed closed-loop design can prolong the service time of memristor-based neural network chip by 14.85x (14.94x), without reducing computational speed. Extra circuitry of the feedback controller induces a negligible overhead about 1.16% on overall power consumption.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132499822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Optimal multi-row detailed placement for yield and model-hardware correlation improvements in sub-10nm VLSI 亚10nm VLSI中良率与模型硬件相关改善的最佳多行详细布局
Pub Date : 2017-11-13 DOI: 10.5555/3199700.3199789
C. Han, Kwangsoo Han, A. Kahng, Hyein Lee, Lutong Wang, Bangqi Xu
In sub-10nm, nodes, a change or step in diffusion height between adjacent standard cells causes yield loss as well as a form of model-hardware miscorrelation called neighbor diffusion effect (NDE). Cell libraries must inevitably have multiple diffusion heights (numbers of fins in PFETs and NFETs) in order to enable flexible exploration of the power-performance envelope for design. However, this brings step-induced risks of NDE, for which guardbanding is costly, as well as yield loss. Special filler cells can protect against harmful NDE effects, but are costly in terms of area. In this work, we develop dynamic programming-based single-row and double-row detailed placement optimizations that optimally minimize the impacts of NDE. Our algorithms support a richer set of cell movements than in previous works — i.e., flipping, relocating and reordering within the original row; we also consider cell displacement and flipping costs. Importantly, to our knowledge, our dynamic programming-based optimal detailed placement algorithm is the first to handle multiple rows with multiple-height cells that can be reordered. We further develop a timing-aware approach, which is capable of recovering (or, improving) the worst negative slack (WNS) by creating additional diffusion steps around timing-critical cells.
在10nm以下的节点中,相邻标准单元之间扩散高度的变化或步进会导致产率损失以及一种称为相邻扩散效应(NDE)的模型-硬件错相关形式。单元库必须不可避免地具有多个扩散高度(pfet和nfet中的鳍片数量),以便能够灵活地探索设计的功率性能包络。然而,这带来了阶梯式的濒死风险,保护成本高昂,也带来了产量损失。特殊的填充电池可以防止有害的濒死体验的影响,但在面积方面是昂贵的。在这项工作中,我们开发了基于动态规划的单行和双行详细放置优化,以最大限度地减少NDE的影响。我们的算法支持比以前的作品更丰富的细胞运动集-即,在原始行内翻转,重新定位和重新排序;我们还考虑了细胞位移和翻转成本。重要的是,据我们所知,我们基于动态规划的最优详细布局算法是第一个处理可重新排序的多行多高度单元格的算法。我们进一步开发了一种时间感知方法,该方法能够通过在时间临界细胞周围创建额外的扩散步骤来恢复(或改善)最坏的负松弛(WNS)。
{"title":"Optimal multi-row detailed placement for yield and model-hardware correlation improvements in sub-10nm VLSI","authors":"C. Han, Kwangsoo Han, A. Kahng, Hyein Lee, Lutong Wang, Bangqi Xu","doi":"10.5555/3199700.3199789","DOIUrl":"https://doi.org/10.5555/3199700.3199789","url":null,"abstract":"In sub-10nm, nodes, a change or step in diffusion height between adjacent standard cells causes yield loss as well as a form of model-hardware miscorrelation called neighbor diffusion effect (NDE). Cell libraries must inevitably have multiple diffusion heights (numbers of fins in PFETs and NFETs) in order to enable flexible exploration of the power-performance envelope for design. However, this brings step-induced risks of NDE, for which guardbanding is costly, as well as yield loss. Special filler cells can protect against harmful NDE effects, but are costly in terms of area. In this work, we develop dynamic programming-based single-row and double-row detailed placement optimizations that optimally minimize the impacts of NDE. Our algorithms support a richer set of cell movements than in previous works — i.e., flipping, relocating and reordering within the original row; we also consider cell displacement and flipping costs. Importantly, to our knowledge, our dynamic programming-based optimal detailed placement algorithm is the first to handle multiple rows with multiple-height cells that can be reordered. We further develop a timing-aware approach, which is capable of recovering (or, improving) the worst negative slack (WNS) by creating additional diffusion steps around timing-critical cells.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132674728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Online message delay prediction for model predictive control over controller area network 控制器局域网模型预测控制的在线消息延迟预测
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203776
A. Rao, Haibo Zeng
Today's Cyber-Physical Systems (CPS) are typically distributed over several computing nodes communicated through buses such as Controller Area Network (CAN). Their control performance gets degraded due to variable delays incurred by messages on the shared CAN bus. This paper presents a novel online delay prediction method that predicts the message delay at runtime based on real-time traffic information on CAN. It leverages the proposed method to improve control quality, by compensating the message delay in the Model Predictive Control (MPC) algorithm design. It demonstrates that the delay prediction is accurate, and the MPC design which takes the message delay into consideration performs considerably better. It also implements the proposed method on an 8-bit 16MHz ATmega328P microcontroller and measures the execution time overhead. The results clearly indicate that the method is computationally feasible for online usage.
当今的信息物理系统(CPS)通常分布在通过总线(如控制器局域网(CAN))通信的多个计算节点上。由于共享CAN总线上的消息引起的可变延迟,它们的控制性能下降。本文提出了一种基于CAN实时流量信息的在线延迟预测方法。该方法通过补偿模型预测控制(MPC)算法设计中的消息延迟来提高控制质量。实验结果表明,延迟预测是准确的,考虑了消息延迟的MPC设计效果要好得多。并在8位16MHz ATmega328P微控制器上实现了该方法,并测量了执行时间开销。结果表明,该方法在计算上是可行的。
{"title":"Online message delay prediction for model predictive control over controller area network","authors":"A. Rao, Haibo Zeng","doi":"10.1109/ICCAD.2017.8203776","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203776","url":null,"abstract":"Today's Cyber-Physical Systems (CPS) are typically distributed over several computing nodes communicated through buses such as Controller Area Network (CAN). Their control performance gets degraded due to variable delays incurred by messages on the shared CAN bus. This paper presents a novel online delay prediction method that predicts the message delay at runtime based on real-time traffic information on CAN. It leverages the proposed method to improve control quality, by compensating the message delay in the Model Predictive Control (MPC) algorithm design. It demonstrates that the delay prediction is accurate, and the MPC design which takes the message delay into consideration performs considerably better. It also implements the proposed method on an 8-bit 16MHz ATmega328P microcontroller and measures the execution time overhead. The results clearly indicate that the method is computationally feasible for online usage.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133805446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hybrid state machine model for fast model predictive control: Application to path tracking 快速模型预测控制的混合状态机模型:在路径跟踪中的应用
Pub Date : 2017-11-13 DOI: 10.1109/ICCAD.2017.8203777
M. Amir, T. Givargis
Cyber-Physical Systems (CPS) are composed of computing devices interacting with physical systems. Model-based design is a powerful methodology in CPS design in the implementation of control systems. For instance, Model Predictive Control (MPC) is typically implemented in CPS applications, e.g., in path tracking of autonomous vehicles. MPC deploys a model to estimate the behavior of the physical system at future time instants for a specific time horizon. Ordinary Differential Equations (ODE) are the most commonly used models to emulate the behavior of continuous-time (non-)linear dynamical systems. A complex physical model may comprise thousands of ODEs which pose scalability, performance and power consumption challenges. One approach to address these model complexity challenges are frameworks that automate the development of model-to-model transformation. In this paper, we introduce a model generation framework to transform ODE models of a physical system to Hybrid Harmonic Equivalent State (HES) Machine model equivalents. Moreover, tuning parameters are introduced to reconfigure the model and adjust its accuracy from coarse-grained time critical situations to fine-grained scenarios in which safety is paramount. Machine learning techniques are applied to adopt the model to run-time applications. We conduct experiments on a closed-loop MPC for path tracking using the vehicle dynamics model. We analyze the performance of the MPC when applying our Hybrid HES Machine model. The performance of our proposed model is compared with state-of-the-art ODE-based models, in terms of execution time and model accuracy. Our experimental results show a 32% reduction in MPC return time for 0.8% loss in model accuracy.
信息物理系统(CPS)是由与物理系统交互的计算设备组成的。基于模型的设计是控制系统实现中CPS设计的一种强有力的方法。例如,模型预测控制(MPC)通常在CPS应用中实现,例如在自动驾驶车辆的路径跟踪中。MPC部署了一个模型来估计物理系统在特定时间范围内未来时刻的行为。常微分方程(ODE)是最常用的模型来模拟连续时间(非线性)动力系统的行为。一个复杂的物理模型可能包含数千个ode,这会带来可伸缩性、性能和功耗方面的挑战。解决这些模型复杂性挑战的一种方法是使模型到模型转换的开发自动化的框架。本文引入了一个模型生成框架,将物理系统的ODE模型转换为混合谐波等效态(HES)机模型的等效。此外,还引入了调优参数来重新配置模型,并将其精度从粗粒度的时间关键情况调整到安全至上的细粒度场景。应用机器学习技术将模型应用于运行时应用程序。利用车辆动力学模型对闭环MPC进行了路径跟踪实验。应用混合HES机器模型分析了MPC的性能。在执行时间和模型准确性方面,将我们提出的模型的性能与最先进的基于ode的模型进行了比较。我们的实验结果表明,模型精度损失0.8%,MPC返回时间减少32%。
{"title":"Hybrid state machine model for fast model predictive control: Application to path tracking","authors":"M. Amir, T. Givargis","doi":"10.1109/ICCAD.2017.8203777","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203777","url":null,"abstract":"Cyber-Physical Systems (CPS) are composed of computing devices interacting with physical systems. Model-based design is a powerful methodology in CPS design in the implementation of control systems. For instance, Model Predictive Control (MPC) is typically implemented in CPS applications, e.g., in path tracking of autonomous vehicles. MPC deploys a model to estimate the behavior of the physical system at future time instants for a specific time horizon. Ordinary Differential Equations (ODE) are the most commonly used models to emulate the behavior of continuous-time (non-)linear dynamical systems. A complex physical model may comprise thousands of ODEs which pose scalability, performance and power consumption challenges. One approach to address these model complexity challenges are frameworks that automate the development of model-to-model transformation. In this paper, we introduce a model generation framework to transform ODE models of a physical system to Hybrid Harmonic Equivalent State (HES) Machine model equivalents. Moreover, tuning parameters are introduced to reconfigure the model and adjust its accuracy from coarse-grained time critical situations to fine-grained scenarios in which safety is paramount. Machine learning techniques are applied to adopt the model to run-time applications. We conduct experiments on a closed-loop MPC for path tracking using the vehicle dynamics model. We analyze the performance of the MPC when applying our Hybrid HES Machine model. The performance of our proposed model is compared with state-of-the-art ODE-based models, in terms of execution time and model accuracy. Our experimental results show a 32% reduction in MPC return time for 0.8% loss in model accuracy.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"47 9-10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132727642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
期刊
2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1