首页 > 最新文献

2018 19th International Symposium on Quality Electronic Design (ISQED)最新文献

英文 中文
A post-silicon hold time closure technique using data-path tunable-buffers for variation-tolerance in sub-threshold designs 在亚阈值设计中使用数据路径可调缓冲器的后硅保持时间封闭技术
Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357310
Divya Akella, Xinfei Guo, H. Patel, M. Stan, B. Calhoun
This paper presents a post-silicon hold time closure technique for performance-relaxed, sub-threshold digital designs using tunable-buffer insertion in hold-critical data-paths. Hold time closure in flip-flop based digital circuits is highly critical because hold failures cannot be corrected post-fabrication. This criticality increases in the sub-threshold domain, which is highly sensitive to process, voltage, and temperature variations. Design-time hold margins enable robust hold time closure across variations. However, insufficient hold margins can lead to chip failures and overestimated hold margins introduce additional costs in area and power. In this paper, we propose a post-silicon hold time closure methodology that introduces tunable-buffers in the data-path. This enables post-silicon correction of hold violations and therefore, reduces the design effort in estimating design-time hold margins. We design a tunable-buffer, demonstrate the tunable-buffer insertion strategy, and present a physical design flow using standard EDA tools. We verify this technique with measurements of a 130 nm test chip. A design-dependent hold slack improvement in the range of 103%–195% is achieved compared to the traditional buffering technique, with minimal power and area overhead. This technique also has the potential to reduce the number of buffers inserted for hold closure.
本文提出了一种后硅保持时间关闭技术,用于性能宽松的亚阈值数字设计,在保持关键数据路径中使用可调缓冲器插入。在基于触发器的数字电路中,保持时间关闭是非常关键的,因为保持故障不能在制造后纠正。这种临界性在亚阈值域中增加,这对工艺、电压和温度变化高度敏感。设计时保持余量可以实现跨变量的稳定保持时间关闭。然而,不足的保留空间可能导致芯片故障,过高的保留空间会带来额外的面积和功耗成本。在本文中,我们提出了一种后硅保持时间闭包方法,该方法在数据路径中引入了可调缓冲区。这使得保持偏差的硅后校正成为可能,因此减少了估计设计时保持余量的设计工作量。我们设计了一个可调缓冲器,演示了可调缓冲器插入策略,并使用标准的EDA工具给出了一个物理设计流程。我们通过130 nm测试芯片的测量来验证该技术。与传统的缓冲技术相比,基于设计的保持松弛度提高了103%-195%,功耗和面积开销最小。这种技术也有可能减少为保持关闭而插入的缓冲区的数量。
{"title":"A post-silicon hold time closure technique using data-path tunable-buffers for variation-tolerance in sub-threshold designs","authors":"Divya Akella, Xinfei Guo, H. Patel, M. Stan, B. Calhoun","doi":"10.1109/ISQED.2018.8357310","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357310","url":null,"abstract":"This paper presents a post-silicon hold time closure technique for performance-relaxed, sub-threshold digital designs using tunable-buffer insertion in hold-critical data-paths. Hold time closure in flip-flop based digital circuits is highly critical because hold failures cannot be corrected post-fabrication. This criticality increases in the sub-threshold domain, which is highly sensitive to process, voltage, and temperature variations. Design-time hold margins enable robust hold time closure across variations. However, insufficient hold margins can lead to chip failures and overestimated hold margins introduce additional costs in area and power. In this paper, we propose a post-silicon hold time closure methodology that introduces tunable-buffers in the data-path. This enables post-silicon correction of hold violations and therefore, reduces the design effort in estimating design-time hold margins. We design a tunable-buffer, demonstrate the tunable-buffer insertion strategy, and present a physical design flow using standard EDA tools. We verify this technique with measurements of a 130 nm test chip. A design-dependent hold slack improvement in the range of 103%–195% is achieved compared to the traditional buffering technique, with minimal power and area overhead. This technique also has the potential to reduce the number of buffers inserted for hold closure.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123736165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A technique to aggregate classes of analog fault diagnostic data based on association rule mining 基于关联规则挖掘的模拟故障诊断数据分类聚合技术
Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357294
Ruslan Dautov, S. Mosin
Analog circuits are widely used in different fields such as medicine, military, aviation and are critical for the development of reliable electronic systems. Testing and diagnosis are important tasks which detect and localize defects in the circuit under test as well as improve quality of the final product. Output responses of fault-free and faulty behavior of analog circuit can be represented by infinite set of values due to tolerances of internal components. The data mining methods may improve quality of fault diagnosis in the case of big data processing. The technique of aggregation the classes of fault diagnostic responses, based on association rule mining, is proposed. The technique corresponds to the simulation before test concept: a fault dictionary is generated by collecting the coefficients of wavelet transformation for fault-free and faulty conditions as the preprocessing of output signals. Classificator is based on k-nearest neighbors method (k-NN) and association rule mining algorithm. The fault diagnostic technique was trained and tested using data obtained after simulation of fault-free and faulty behavior of the analog filter. In result the accuracy in classifying faulty conditions and fault coverage have consisted of more than 99,09% and more than 99,08% correspondingly. The proposed technique is completely automated and can be extended.
模拟电路广泛应用于医学、军事、航空等不同领域,对开发可靠的电子系统至关重要。测试和诊断是检测和定位被测电路缺陷,提高最终产品质量的重要任务。模拟电路的无故障和故障行为的输出响应由于内部元件的容差可以用无限的值集来表示。在大数据处理的情况下,数据挖掘方法可以提高故障诊断的质量。提出了一种基于关联规则挖掘的故障诊断响应分类聚合技术。该技术对应于测试前仿真的概念:通过采集无故障和故障情况下的小波变换系数,生成故障字典作为输出信号的预处理。分类器基于k近邻方法(k-NN)和关联规则挖掘算法。利用模拟滤波器的无故障和故障行为仿真得到的数据对故障诊断技术进行了训练和测试。结果表明,该方法对故障状态和故障覆盖率的分类准确率分别大于99.9%和99.8%。所提出的技术是完全自动化的,并且可以扩展。
{"title":"A technique to aggregate classes of analog fault diagnostic data based on association rule mining","authors":"Ruslan Dautov, S. Mosin","doi":"10.1109/ISQED.2018.8357294","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357294","url":null,"abstract":"Analog circuits are widely used in different fields such as medicine, military, aviation and are critical for the development of reliable electronic systems. Testing and diagnosis are important tasks which detect and localize defects in the circuit under test as well as improve quality of the final product. Output responses of fault-free and faulty behavior of analog circuit can be represented by infinite set of values due to tolerances of internal components. The data mining methods may improve quality of fault diagnosis in the case of big data processing. The technique of aggregation the classes of fault diagnostic responses, based on association rule mining, is proposed. The technique corresponds to the simulation before test concept: a fault dictionary is generated by collecting the coefficients of wavelet transformation for fault-free and faulty conditions as the preprocessing of output signals. Classificator is based on k-nearest neighbors method (k-NN) and association rule mining algorithm. The fault diagnostic technique was trained and tested using data obtained after simulation of fault-free and faulty behavior of the analog filter. In result the accuracy in classifying faulty conditions and fault coverage have consisted of more than 99,09% and more than 99,08% correspondingly. The proposed technique is completely automated and can be extended.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115875167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Synthesis of normally-off boolean circuits: An evolutionary optimization approach utilizing spintronic devices 正常关断布尔电路的合成:利用自旋电子器件的进化优化方法
Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357264
A. Roohi, Ramtin Zand, R. Demara
In this paper, we develop an evolutionary-driven circuit optimization methodology, which can be leveraged for the synthesis of spintronic-based normally-off computing (NoC) circuits. NoC architectures distribute nonvolatile memory elements throughout the CMOS logic plane, creating a new class of fine-grained functionally-constrained synthesis challenges. Spin-based NoC circuits synthesis objectives include increased computational throughput and reduced static power consumption. Our proposed methodology utilizes Genetic Algorithms (GAs) to optimize the implementation of a Boolean logic expression in terms of area, delay, or power consumption. It first leverages the spin-based device characteristics to achieve a primary semi-optimized implementation, then further performance optimization is applied to the implemented design based on the NoC requirements and optimization criteria. As a proof-of-concept, the optimization approach is leveraged to implement a functionally-complete set of Boolean logic gates using spin Hall effect (SHE)-magnetic tunnel junctions (MTJs), which are optimized for both power and delay objectives. NoC synthesis methodologies supporting NoC circuit design of emerging device and hybrid CMOS logic applications. Finally, Simulation results and analyses verified the functionality of our proposed optimization tool for NoC circuit implementations.
在本文中,我们开发了一种进化驱动电路优化方法,该方法可用于合成基于自旋电子的正常关闭计算(NoC)电路。NoC架构将非易失性存储元件分布在整个CMOS逻辑平面上,创造了一类新的细粒度功能约束合成挑战。基于自旋的NoC电路的合成目标包括提高计算吞吐量和降低静态功耗。我们提出的方法利用遗传算法(GAs)来优化布尔逻辑表达式在面积、延迟或功耗方面的实现。它首先利用基于自旋的器件特性来实现主要的半优化实现,然后根据NoC要求和优化标准对实现的设计进行进一步的性能优化。作为概念验证,该优化方法利用自旋霍尔效应(SHE)-磁隧道结(MTJs)实现了一组功能完备的布尔逻辑门,并针对功率和延迟目标进行了优化。支持新兴器件和混合CMOS逻辑应用的NoC电路设计的NoC合成方法。最后,仿真结果和分析验证了我们提出的优化工具在NoC电路实现中的功能。
{"title":"Synthesis of normally-off boolean circuits: An evolutionary optimization approach utilizing spintronic devices","authors":"A. Roohi, Ramtin Zand, R. Demara","doi":"10.1109/ISQED.2018.8357264","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357264","url":null,"abstract":"In this paper, we develop an evolutionary-driven circuit optimization methodology, which can be leveraged for the synthesis of spintronic-based normally-off computing (NoC) circuits. NoC architectures distribute nonvolatile memory elements throughout the CMOS logic plane, creating a new class of fine-grained functionally-constrained synthesis challenges. Spin-based NoC circuits synthesis objectives include increased computational throughput and reduced static power consumption. Our proposed methodology utilizes Genetic Algorithms (GAs) to optimize the implementation of a Boolean logic expression in terms of area, delay, or power consumption. It first leverages the spin-based device characteristics to achieve a primary semi-optimized implementation, then further performance optimization is applied to the implemented design based on the NoC requirements and optimization criteria. As a proof-of-concept, the optimization approach is leveraged to implement a functionally-complete set of Boolean logic gates using spin Hall effect (SHE)-magnetic tunnel junctions (MTJs), which are optimized for both power and delay objectives. NoC synthesis methodologies supporting NoC circuit design of emerging device and hybrid CMOS logic applications. Finally, Simulation results and analyses verified the functionality of our proposed optimization tool for NoC circuit implementations.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115130472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Logic-based row redundancy technique designed in 7nm FinFET technology for embedded SRAMs 采用7nm FinFET技术设计的基于逻辑的行冗余技术
Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357300
V. Nautiyal, N. Nukala, F. Bohra, S. Dwivedi, J. Dasani, Satinderjit Singh, G. Singla, M. Kinkade
In this paper, a row-redundancy circuit using latches is designed for 7nm FinFET ultra high density SRAM operating at 1.75 GHz. Input and faulty addresses are compared in parallel to the memory read access operation thus avoiding a major impact on access or address setup time. Latch output data is multiplexed with memory data and the impact on access time is only 7ps at SS/0.675V/-40°C corner. Data is written to redundant latches only when address comparison matches. The proposed circuit is implemented with no setup time impact and an overall area overhead of the proposed row redundancy scheme is less by 82% as compared to the area overhead of the conventional redundancy scheme.
本文设计了一种采用锁存器的7nm FinFET超高密度SRAM行冗余电路,工作频率为1.75 GHz。输入和故障地址与存储器读访问操作并行比较,从而避免对访问或地址设置时间产生重大影响。锁存输出数据与内存数据复用,在SS/0.675V/-40°C角对访问时间的影响仅为7ps。只有当地址比较匹配时,数据才被写入冗余锁存器。该电路的实现没有设置时间的影响,并且与传统冗余方案的面积开销相比,该方案的总体面积开销减少了82%。
{"title":"Logic-based row redundancy technique designed in 7nm FinFET technology for embedded SRAMs","authors":"V. Nautiyal, N. Nukala, F. Bohra, S. Dwivedi, J. Dasani, Satinderjit Singh, G. Singla, M. Kinkade","doi":"10.1109/ISQED.2018.8357300","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357300","url":null,"abstract":"In this paper, a row-redundancy circuit using latches is designed for 7nm FinFET ultra high density SRAM operating at 1.75 GHz. Input and faulty addresses are compared in parallel to the memory read access operation thus avoiding a major impact on access or address setup time. Latch output data is multiplexed with memory data and the impact on access time is only 7ps at SS/0.675V/-40°C corner. Data is written to redundant latches only when address comparison matches. The proposed circuit is implemented with no setup time impact and an overall area overhead of the proposed row redundancy scheme is less by 82% as compared to the area overhead of the conventional redundancy scheme.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126316976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Quantized neural networks with new stochastic multipliers 新的随机乘法器的量化神经网络
Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357316
Bingzhe Li, M. Najafi, Bo Yuan, D. Lilja
With increased interests of neural networks, hardware implementations of neural networks have been investigated. Researchers pursue low hardware cost by using different technologies such as stochastic computing and quantization. For example, the quantization is able to reduce total number of trained weights and results in low hardware cost. Stochastic computing aims to lower hardware costs substantially by using simple gates instead of complex arithmetic operations. In this paper, we propose a new stochastic multiplier with shifted unary code adders (SUC-Adder) for quantized neural networks. The new design uses the characteristic of quantized weights and tremendously reduces the hardware cost of neural networks. Experimental results indicate that our stochastic design achieves about 10x energy reduction compared to its counterpart binary implementation while maintaining slightly higher recognition error rates than the binary implementation.
随着人们对神经网络的兴趣日益浓厚,人们开始研究神经网络的硬件实现。研究人员利用随机计算和量化等技术来追求低硬件成本。例如,量化能够减少训练权重的总数,从而降低硬件成本。随机计算的目的是通过使用简单的门而不是复杂的算术运算来大幅降低硬件成本。本文提出了一种新的带有移位一元码加法器的随机乘法器,用于量化神经网络。新设计利用了权值量化的特点,极大地降低了神经网络的硬件成本。实验结果表明,我们的随机设计与对应的二进制实现相比,实现了大约10倍的能量降低,同时保持了略高于二进制实现的识别错误率。
{"title":"Quantized neural networks with new stochastic multipliers","authors":"Bingzhe Li, M. Najafi, Bo Yuan, D. Lilja","doi":"10.1109/ISQED.2018.8357316","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357316","url":null,"abstract":"With increased interests of neural networks, hardware implementations of neural networks have been investigated. Researchers pursue low hardware cost by using different technologies such as stochastic computing and quantization. For example, the quantization is able to reduce total number of trained weights and results in low hardware cost. Stochastic computing aims to lower hardware costs substantially by using simple gates instead of complex arithmetic operations. In this paper, we propose a new stochastic multiplier with shifted unary code adders (SUC-Adder) for quantized neural networks. The new design uses the characteristic of quantized weights and tremendously reduces the hardware cost of neural networks. Experimental results indicate that our stochastic design achieves about 10x energy reduction compared to its counterpart binary implementation while maintaining slightly higher recognition error rates than the binary implementation.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122391472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Recognition of regular layout structures 规则布局结构的识别
Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357268
Yu-Cheng Chiang, Shr-Cheng Tsai, Rung-Bin Lin
This paper presents an algorithm for finding array structures in a layout design. The algorithm can find all the regular layout structures from a flattened layout design without knowing its building blocks beforehand. A potential application of this work is to reduce layout DRC and lithography check time. Experimental results show that our algorithm is efficient and robust.
本文提出了一种在布局设计中查找数组结构的算法。该算法可以在不知道平面布局结构的前提下,从平面布局设计中找到所有规则的布局结构。这项工作的潜在应用是减少版式DRC和光刻检查时间。实验结果表明,该算法具有良好的鲁棒性和有效性。
{"title":"Recognition of regular layout structures","authors":"Yu-Cheng Chiang, Shr-Cheng Tsai, Rung-Bin Lin","doi":"10.1109/ISQED.2018.8357268","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357268","url":null,"abstract":"This paper presents an algorithm for finding array structures in a layout design. The algorithm can find all the regular layout structures from a flattened layout design without knowing its building blocks beforehand. A potential application of this work is to reduce layout DRC and lithography check time. Experimental results show that our algorithm is efficient and robust.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114273075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient K nearest neighbor algorithm implementations for throughput-oriented architectures 面向吞吐量架构的高效K近邻算法实现
Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357279
Jihyun Ryoo, Meenakshi Arunachalam, R. Khanna, M. Kandemir
Scores of emerging and domain-specific applications need the ability to acquire and augment new knowledge from offline training-sets and online user interactions. This requires an underlying computing platform that can host machine learning (ML) kernels. This in turn entails one to have efficient implementations of the frequently-used ML kernels on state-of-the-art multicores and many-cores, to act as high-performance accelerators. Motivated by this observation, this paper focuses on one such ML kernel, namely, K Nearest Neighbor (KNN), and conducts a comprehensive comparison of its behavior on two alternate accelerator-based systems: NVIDIA GPU and Intel Xeon Phi (both KNC and KNL architectures). More explicitly, we discuss and experimentally evaluate various optimizations that can be applied to both GPU and Xeon Phi, as well as optimizations that are specific to either GPU or Xeon Phi. Furthermore, we implement different versions of KNN on these candidate accelerators and collect experimental data using various inputs. Our experimental evaluations suggest that, by using both general purpose and accelerator specific optimizations, one can achieve average speedups ranging 0.49x–3.48x (training) and 1.43x–9.41x (classification) on Xeon Phi series, compared to 0.05x–0.60x (training), 1.61x–6.32x (classification) achieved by the GPU version, both over the standard host-only system.
许多新兴的和特定领域的应用程序需要能够从离线训练集和在线用户交互中获取和增强新知识。这需要一个可以承载机器学习(ML)内核的底层计算平台。这反过来又需要在最先进的多核和多核上有效地实现常用的ML内核,以充当高性能加速器。受此启发,本文将重点放在一个这样的机器学习内核上,即K最近邻(KNN),并对其在两种基于加速器的系统上的行为进行了全面比较:NVIDIA GPU和Intel Xeon Phi(都是KNC和KNL架构)。更明确地说,我们讨论和实验评估各种优化,可以应用于GPU和Xeon Phi,以及特定于GPU或Xeon Phi的优化。此外,我们在这些候选加速器上实现了不同版本的KNN,并使用不同的输入收集了实验数据。我们的实验评估表明,通过使用通用和特定于加速器的优化,可以在Xeon Phi系列上实现0.49x - 3.48倍(训练)和1.43x - 9.41倍(分类)的平均速度提升,而GPU版本的平均速度提升为0.05x - 0.60倍(训练),1.61x - 6.32倍(分类),两者都是在标准的仅主机系统上实现的。
{"title":"Efficient K nearest neighbor algorithm implementations for throughput-oriented architectures","authors":"Jihyun Ryoo, Meenakshi Arunachalam, R. Khanna, M. Kandemir","doi":"10.1109/ISQED.2018.8357279","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357279","url":null,"abstract":"Scores of emerging and domain-specific applications need the ability to acquire and augment new knowledge from offline training-sets and online user interactions. This requires an underlying computing platform that can host machine learning (ML) kernels. This in turn entails one to have efficient implementations of the frequently-used ML kernels on state-of-the-art multicores and many-cores, to act as high-performance accelerators. Motivated by this observation, this paper focuses on one such ML kernel, namely, K Nearest Neighbor (KNN), and conducts a comprehensive comparison of its behavior on two alternate accelerator-based systems: NVIDIA GPU and Intel Xeon Phi (both KNC and KNL architectures). More explicitly, we discuss and experimentally evaluate various optimizations that can be applied to both GPU and Xeon Phi, as well as optimizations that are specific to either GPU or Xeon Phi. Furthermore, we implement different versions of KNN on these candidate accelerators and collect experimental data using various inputs. Our experimental evaluations suggest that, by using both general purpose and accelerator specific optimizations, one can achieve average speedups ranging 0.49x–3.48x (training) and 1.43x–9.41x (classification) on Xeon Phi series, compared to 0.05x–0.60x (training), 1.61x–6.32x (classification) achieved by the GPU version, both over the standard host-only system.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132454971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Routing at compile time 编译时的路由
Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357283
Chun-Xun Lin, Tsung-Wei Huang, Martin D. F. Wong
The rapid evolution of modern C++ programming language has completely changed the way developers write high-performance and robust applications. By modern, we mean C++17, which has revolutionized the “old-fashion” C++98 in many aspects such as meta-programming, concurrency controls, and functional programming. Despite the tremendous progress in language innovation, research on how these advanced features can improve EDA programs is still nascent. In this paper, we introduce a novel routing framework using the technique of generalized constant expression in C++17. Our framework allows a router to take advantage of compile-time computation and thus can save a significant amount of engineering effort that would otherwise be issued every time the program runs. By prescribing computation at compile time, the compiler is able to further produce more optimized codes to run faster than ever before. We have evaluated our framework on classic routing problems and have demonstrated promising performance gain over which is done solely at runtime. Our framework has the potential to change many fundamental EDA building blocks and thus can achieve better tool performance and engineering productivity.
现代c++编程语言的快速发展彻底改变了开发人员编写高性能和健壮应用程序的方式。所谓现代,我们指的是c++ 17,它在元编程、并发控制和函数式编程等许多方面彻底改变了“老式”c++ 98。尽管语言创新取得了巨大的进步,但关于这些高级功能如何改进EDA程序的研究仍处于起步阶段。本文在c++ 17中引入了一种利用广义常量表达式技术的路由框架。我们的框架允许路由器利用编译时计算,因此可以节省大量的工程工作,否则每次程序运行时都会发出这些工作。通过在编译时规定计算,编译器能够进一步生成更优化的代码,从而比以前运行得更快。我们已经在经典路由问题上评估了我们的框架,并证明了仅在运行时完成的框架有很大的性能提升。我们的框架有可能改变许多基本的EDA构建块,从而可以实现更好的工具性能和工程生产力。
{"title":"Routing at compile time","authors":"Chun-Xun Lin, Tsung-Wei Huang, Martin D. F. Wong","doi":"10.1109/ISQED.2018.8357283","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357283","url":null,"abstract":"The rapid evolution of modern C++ programming language has completely changed the way developers write high-performance and robust applications. By modern, we mean C++17, which has revolutionized the “old-fashion” C++98 in many aspects such as meta-programming, concurrency controls, and functional programming. Despite the tremendous progress in language innovation, research on how these advanced features can improve EDA programs is still nascent. In this paper, we introduce a novel routing framework using the technique of generalized constant expression in C++17. Our framework allows a router to take advantage of compile-time computation and thus can save a significant amount of engineering effort that would otherwise be issued every time the program runs. By prescribing computation at compile time, the compiler is able to further produce more optimized codes to run faster than ever before. We have evaluated our framework on classic routing problems and have demonstrated promising performance gain over which is done solely at runtime. Our framework has the potential to change many fundamental EDA building blocks and thus can achieve better tool performance and engineering productivity.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129860019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Securing FPGA-based obsolete component replacement for legacy systems 确保基于fpga的过时组件替换遗留系统
Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357320
Zhiming Zhang, L. Njilla, C. Kamhoua, K. Kwiat, Qiaoyan Yu
Component-aging is unavoidable in legacy systems. Although re-designing the system typically results in a high cost, the need to replace aged components for legacy systems is an urgent priority. Unfortunately, the aged components are likely to be obsolete and not available on the current market. Obsolete component replacement with field-programmable gate array (FPGA) devices is emerging as a feasible option to extend the lifetime of legacy systems. While replacing the aged component, we traditionally only focus on matching the functionality and neglect the potential security threats from FPGA replacement. However, recent literature demonstrates that FPGA devices may contain hardware Trojans, which are induced during FPGA device fabrication or bitstream generation time. To prevent the Trojans on FPGA from receiving external inputs or leaking sensitive information, we propose a Runtime Pin Grounding (RPG) scheme to ground the unused pins and check the pin status at every clock cycle. Furthermore, we exploit the principle of moving target defense (MTD) and propose a hardware MTD (HMTD) method. In our method, the aged obsolete unit is replicated to multiple copies in the FPGA device, and two of the replicas are randomly selected for output comparison and thus Trojan detection. We successfully implemented the proposed RPG and HMTD methods on a Nexys-3 FPGA board. Our case study shows that the proposed RPG scheme increases the FPGA utilization rate by less than 0.1%. On average, our HMTD method reduces the hardware Trojan bypass rate by 61% over the existing method.
组件老化在遗留系统中是不可避免的。虽然重新设计系统通常会导致高成本,但替换遗留系统的老旧组件的需求是当务之急。不幸的是,老化的组件很可能是过时的,不能在目前的市场上。用现场可编程门阵列(FPGA)设备替换过时的组件正成为延长遗留系统使用寿命的可行选择。传统上,在更换老化器件时,我们只关注功能的匹配,而忽略了FPGA更换可能带来的安全威胁。然而,最近的文献表明,FPGA器件可能包含硬件木马,这是在FPGA器件制造或比特流生成时引起的。为了防止FPGA上的木马接收外部输入或泄露敏感信息,我们提出了一种运行时引脚接地(RPG)方案,对未使用的引脚进行接地,并在每个时钟周期检查引脚状态。在此基础上,利用运动目标防御(MTD)原理,提出了一种硬件MTD方法。在我们的方法中,老化的废弃单元在FPGA器件中复制到多个副本,并随机选择两个副本进行输出比较,从而进行木马检测。我们成功地在Nexys-3 FPGA板上实现了所提出的RPG和HMTD方法。我们的案例研究表明,所提出的RPG方案使FPGA的利用率提高不到0.1%。平均而言,我们的HMTD方法比现有方法降低了61%的硬件木马绕过率。
{"title":"Securing FPGA-based obsolete component replacement for legacy systems","authors":"Zhiming Zhang, L. Njilla, C. Kamhoua, K. Kwiat, Qiaoyan Yu","doi":"10.1109/ISQED.2018.8357320","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357320","url":null,"abstract":"Component-aging is unavoidable in legacy systems. Although re-designing the system typically results in a high cost, the need to replace aged components for legacy systems is an urgent priority. Unfortunately, the aged components are likely to be obsolete and not available on the current market. Obsolete component replacement with field-programmable gate array (FPGA) devices is emerging as a feasible option to extend the lifetime of legacy systems. While replacing the aged component, we traditionally only focus on matching the functionality and neglect the potential security threats from FPGA replacement. However, recent literature demonstrates that FPGA devices may contain hardware Trojans, which are induced during FPGA device fabrication or bitstream generation time. To prevent the Trojans on FPGA from receiving external inputs or leaking sensitive information, we propose a Runtime Pin Grounding (RPG) scheme to ground the unused pins and check the pin status at every clock cycle. Furthermore, we exploit the principle of moving target defense (MTD) and propose a hardware MTD (HMTD) method. In our method, the aged obsolete unit is replicated to multiple copies in the FPGA device, and two of the replicas are randomly selected for output comparison and thus Trojan detection. We successfully implemented the proposed RPG and HMTD methods on a Nexys-3 FPGA board. Our case study shows that the proposed RPG scheme increases the FPGA utilization rate by less than 0.1%. On average, our HMTD method reduces the hardware Trojan bypass rate by 61% over the existing method.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132421830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Energy efficient neuromorphic processing using spintronic memristive device with dedicated synaptic and neuron terminology 使用专用突触和神经元术语的自旋电子记忆装置高效神经形态处理
Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357266
Z. Pajouhi
Research towards brain-inspired computing based on beyond CMOS devices has gained momentum in recent years. The motivation beyond this vigorous research prevails in exploitation of the resemblance between the computing principles and the device characteristics. To this end, the devices are used to perform otherwise time-consuming and power hungry tasks required for brain-inspired computing. Due to their miniaturized dimensions, zero leakage and nonvolatility, spintronic devices are among the most promising class of beyond CMOS devices. In this paper, we propose a novel spintronic structure based on antiferrromagnetically coupled domain walls. The device structure enables dedicated terminology for synaptic and neuron connections. This characteristic enables more efficient design of neuromorphic systems by allowing larger design space for designers. Furthermore, thanks to the coupling between the domain walls, the device can potentially operate at higher speeds while maintaining the energy consumption of the device; this higher speed contributes to improved performance of the neuromorphic system. In order to evaluate our proposed device structure, we developed a cross-layer simulation framework. Our simulation framework analyzes the neuromorphic system at the device, circuit and algorithm levels. Our simulation results show an order of magnitude improvement in the energy consumption compared to CMOS and analog neurons and up to 2X performance improvement as well as 8% improvement in the energy over state-of-the-art neuromorphic platforms using spintronic devices.
近年来,基于非CMOS器件的脑启发计算研究势头强劲。这种积极的研究背后的动机是利用计算原理和设备特性之间的相似性。为此,这些设备被用于执行脑力计算所需的耗时和耗电的任务。由于其小型化的尺寸,零泄漏和无挥发性,自旋电子器件是最有前途的一类超越CMOS器件。本文提出了一种基于反铁磁耦合畴壁的自旋电子结构。该设备结构为突触和神经元连接提供了专用术语。这一特性为设计师提供了更大的设计空间,从而使神经形态系统的设计更有效。此外,由于畴壁之间的耦合,该设备可以在保持设备能耗的同时以更高的速度运行;这种较高的速度有助于改善神经形态系统的性能。为了评估我们提出的器件结构,我们开发了一个跨层仿真框架。我们的仿真框架在设备、电路和算法层面分析了神经形态系统。我们的模拟结果显示,与CMOS和模拟神经元相比,能量消耗有了数量级的提高,性能提高了2倍,与使用自旋电子设备的最先进的神经形态平台相比,能量提高了8%。
{"title":"Energy efficient neuromorphic processing using spintronic memristive device with dedicated synaptic and neuron terminology","authors":"Z. Pajouhi","doi":"10.1109/ISQED.2018.8357266","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357266","url":null,"abstract":"Research towards brain-inspired computing based on beyond CMOS devices has gained momentum in recent years. The motivation beyond this vigorous research prevails in exploitation of the resemblance between the computing principles and the device characteristics. To this end, the devices are used to perform otherwise time-consuming and power hungry tasks required for brain-inspired computing. Due to their miniaturized dimensions, zero leakage and nonvolatility, spintronic devices are among the most promising class of beyond CMOS devices. In this paper, we propose a novel spintronic structure based on antiferrromagnetically coupled domain walls. The device structure enables dedicated terminology for synaptic and neuron connections. This characteristic enables more efficient design of neuromorphic systems by allowing larger design space for designers. Furthermore, thanks to the coupling between the domain walls, the device can potentially operate at higher speeds while maintaining the energy consumption of the device; this higher speed contributes to improved performance of the neuromorphic system. In order to evaluate our proposed device structure, we developed a cross-layer simulation framework. Our simulation framework analyzes the neuromorphic system at the device, circuit and algorithm levels. Our simulation results show an order of magnitude improvement in the energy consumption compared to CMOS and analog neurons and up to 2X performance improvement as well as 8% improvement in the energy over state-of-the-art neuromorphic platforms using spintronic devices.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130613462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2018 19th International Symposium on Quality Electronic Design (ISQED)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1