首页 > 最新文献

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)最新文献

英文 中文
Exploiting Near-Memory Processing Architectures for Bayesian Neural Networks Acceleration 利用近内存处理架构加速贝叶斯神经网络
Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00045
Yinglin Zhao, Jianlei Yang, Xiaotao Jia, Xueyan Wang, Zhaohao Wang, W. Kang, Youguang Zhang, Weisheng Zhao
Bayesian inference is an effective approach to capture the model uncertainty as well as tackle the over-fitting problem in deep neural networks. Recently Bayesian neural networks (BNNs) are becoming more and more popular and have succeeded in many recognition tasks. However, the BNNs inference procedure requires numerous memory access operations due to the resulted sampling networks. In this paper, a near memory architecture is proposed for accelerating BNN inference by introducing additional memory units near the processing units. The near memory architecture could cache the frequently accessed data to reduce the data movement efficiently. Minimizing the expensive data movements between memory units and computation units contributes to cutting down the latency and energy consumption. Comparing with the traditional approach, the simulation results show that the proposed architecture reduces the energy consumption by 9% and achieves a 1:6 speedup at the cost of 4% area overhead.
在深度神经网络中,贝叶斯推理是捕获模型不确定性和解决过拟合问题的有效方法。近年来,贝叶斯神经网络(BNNs)越来越受欢迎,并在许多识别任务中取得了成功。然而,由于产生的采样网络,bnn推理过程需要大量的内存访问操作。本文提出了一种近内存结构,通过在处理单元附近引入额外的内存单元来加速BNN推理。近内存架构可以缓存频繁访问的数据,有效减少数据移动。最小化内存单元和计算单元之间昂贵的数据移动有助于减少延迟和能耗。与传统方法相比,仿真结果表明,该架构以4%的面积开销为代价,降低了9%的能耗,实现了1:6的加速。
{"title":"Exploiting Near-Memory Processing Architectures for Bayesian Neural Networks Acceleration","authors":"Yinglin Zhao, Jianlei Yang, Xiaotao Jia, Xueyan Wang, Zhaohao Wang, W. Kang, Youguang Zhang, Weisheng Zhao","doi":"10.1109/ISVLSI.2019.00045","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00045","url":null,"abstract":"Bayesian inference is an effective approach to capture the model uncertainty as well as tackle the over-fitting problem in deep neural networks. Recently Bayesian neural networks (BNNs) are becoming more and more popular and have succeeded in many recognition tasks. However, the BNNs inference procedure requires numerous memory access operations due to the resulted sampling networks. In this paper, a near memory architecture is proposed for accelerating BNN inference by introducing additional memory units near the processing units. The near memory architecture could cache the frequently accessed data to reduce the data movement efficiently. Minimizing the expensive data movements between memory units and computation units contributes to cutting down the latency and energy consumption. Comparing with the traditional approach, the simulation results show that the proposed architecture reduces the energy consumption by 9% and achieves a 1:6 speedup at the cost of 4% area overhead.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"115 1","pages":"203-206"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80822378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A One-Cycle FIFO Buffer for Memory Management Units in Manycore Systems 多核系统中内存管理单元的单周期FIFO缓冲器
Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00056
A. Gordon-Ross, S. Abdel-Hafeez, Mohamad Hammam Alsafrjalani
We present an efficient synchronous first-in first-out (FIFO) buffer for enhanced memory management units and inter-core data communication in manycore systems. Our design significantly reduces hardware overhead and eliminates latency delays by using both the rising and falling clock edges during read and write, which makes our design suitable for increased processing element (PE) utilization by increasing the memory bandwidth in complex network and system on-chip solutions. Compared to prior work, our design can operate 5X faster at the same supply voltage, or up to 44X faster with a 2.5X increase in supply voltage. Our design's total power consumption is 7.8 mW with a total transistor count of 34,470.
我们提出了一种高效的同步先进先出(FIFO)缓冲器,用于增强多核系统中的内存管理单元和核间数据通信。我们的设计显著降低了硬件开销,并通过在读写过程中使用上升和下降时钟沿来消除延迟,这使得我们的设计适合通过增加复杂网络和系统片上解决方案中的内存带宽来增加处理元件(PE)利用率。与之前的工作相比,我们的设计可以在相同的电源电压下运行速度提高5倍,或者在电源电压增加2.5倍的情况下运行速度提高44倍。我们设计的总功耗为7.8 mW,晶体管总数为34,470。
{"title":"A One-Cycle FIFO Buffer for Memory Management Units in Manycore Systems","authors":"A. Gordon-Ross, S. Abdel-Hafeez, Mohamad Hammam Alsafrjalani","doi":"10.1109/ISVLSI.2019.00056","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00056","url":null,"abstract":"We present an efficient synchronous first-in first-out (FIFO) buffer for enhanced memory management units and inter-core data communication in manycore systems. Our design significantly reduces hardware overhead and eliminates latency delays by using both the rising and falling clock edges during read and write, which makes our design suitable for increased processing element (PE) utilization by increasing the memory bandwidth in complex network and system on-chip solutions. Compared to prior work, our design can operate 5X faster at the same supply voltage, or up to 44X faster with a 2.5X increase in supply voltage. Our design's total power consumption is 7.8 mW with a total transistor count of 34,470.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"19 1","pages":"265-270"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72715188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Using Harmonized Parabolic Synthesis to Implement a Single-Precision Floating-Point Square Root Unit 用协调抛物综合实现单精度浮点平方根单位
Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00116
Süleyman Savas, Y. Atwa, T. Nordström, Z. Ul-Abdin
This paper proposes a novel method for performing square root operation on floating-point numbers represented in IEEE-754 single-precision (binary32) format. The method is implemented using Harmonized Parabolic Synthesis. It is implemented with and without pipeline stages individually and synthesized for two different Xilinx FPGA boards. The implementations show better resource usage and latency results when compared to other similar works including Xilinx intellectual property (IP) that uses the CORDIC method. Any method calculating the square root will make approximation errors. Unless these errors are distributed evenly around zero, they can accumulate and give a biased result. An attractive feature of the proposed method is the fact that it distributes the errors evenly around zero, in contrast to CORDIC for instance. Due to the small size, low latency, high throughput, and good error properties, the presented floating-point square root unit is suitable for high performance embedded systems. It can be integrated into a processor's floating point unit or be used as a stand-alone accelerator.
本文提出了一种对IEEE-754单精度(binary32)格式表示的浮点数进行平方根运算的新方法。该方法采用协调抛物线综合方法实现。它分别实现了有和没有流水线级,并为两个不同的Xilinx FPGA板合成。与其他类似的工作(包括使用CORDIC方法的Xilinx知识产权(IP))相比,这些实现显示出更好的资源使用和延迟结果。任何计算平方根的方法都会产生近似误差。除非这些误差均匀地分布在零附近,否则它们会累积并给出有偏差的结果。与CORDIC相比,该方法的一个吸引人的特点是它将误差均匀地分布在零附近。该浮点平方根单位具有体积小、时延低、吞吐量高、误差特性好等优点,适用于高性能嵌入式系统。它可以集成到处理器的浮点单元中,也可以用作独立的加速器。
{"title":"Using Harmonized Parabolic Synthesis to Implement a Single-Precision Floating-Point Square Root Unit","authors":"Süleyman Savas, Y. Atwa, T. Nordström, Z. Ul-Abdin","doi":"10.1109/ISVLSI.2019.00116","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00116","url":null,"abstract":"This paper proposes a novel method for performing square root operation on floating-point numbers represented in IEEE-754 single-precision (binary32) format. The method is implemented using Harmonized Parabolic Synthesis. It is implemented with and without pipeline stages individually and synthesized for two different Xilinx FPGA boards. The implementations show better resource usage and latency results when compared to other similar works including Xilinx intellectual property (IP) that uses the CORDIC method. Any method calculating the square root will make approximation errors. Unless these errors are distributed evenly around zero, they can accumulate and give a biased result. An attractive feature of the proposed method is the fact that it distributes the errors evenly around zero, in contrast to CORDIC for instance. Due to the small size, low latency, high throughput, and good error properties, the presented floating-point square root unit is suitable for high performance embedded systems. It can be integrated into a processor's floating point unit or be used as a stand-alone accelerator.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"64 1","pages":"621-626"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74504121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An Asynchronous Analog to Digital Converter for Video Camera Applications 用于视频摄像机的异步模数转换器
Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00040
R. Sunil, K. SiddharthR., Nithin Y. B. Kumar, M. H. Vasantha
This paper proposes an asynchronous analog to digital converter (ADC) for wireless surveillance video camera applications. The proposed architecture is based on a nonuniform sampling, whose sampling instants depend on the input voltage amplitude. The proposed design has the power performance advantage, by using a power down comparator, for an input voltage close to the extreme values. Thus, the proposed architecture is suitable for the applications in which the input signal rarely assumes voltage values closer to the mid-range amplitude voltage. The design is simulated, at the transistor level, in a 180-nm CMOS technology. The results show that about 96.7% of the power can be saved in the best case (input voltage in the vicinity of extreme values) when compared to a conventional flash ADC.
本文提出了一种用于无线监控摄像机的异步模数转换器(ADC)。该结构基于非均匀采样,其采样瞬间依赖于输入电压幅值。所提出的设计具有功率性能优势,通过使用功率下降比较器,输入电压接近极值。因此,所提出的架构适用于输入信号很少假设电压值接近中幅值的应用。该设计在晶体管级采用180纳米CMOS技术进行了仿真。结果表明,与传统闪存ADC相比,在最佳情况下(输入电压接近极值)可节省约96.7%的功率。
{"title":"An Asynchronous Analog to Digital Converter for Video Camera Applications","authors":"R. Sunil, K. SiddharthR., Nithin Y. B. Kumar, M. H. Vasantha","doi":"10.1109/ISVLSI.2019.00040","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00040","url":null,"abstract":"This paper proposes an asynchronous analog to digital converter (ADC) for wireless surveillance video camera applications. The proposed architecture is based on a nonuniform sampling, whose sampling instants depend on the input voltage amplitude. The proposed design has the power performance advantage, by using a power down comparator, for an input voltage close to the extreme values. Thus, the proposed architecture is suitable for the applications in which the input signal rarely assumes voltage values closer to the mid-range amplitude voltage. The design is simulated, at the transistor level, in a 180-nm CMOS technology. The results show that about 96.7% of the power can be saved in the best case (input voltage in the vicinity of extreme values) when compared to a conventional flash ADC.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"358 1","pages":"175-180"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74147311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Logic Synthesis for Hybrid CMOS-ReRAM Sequential Circuits 混合CMOS-ReRAM顺序电路的逻辑综合
Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00084
Saman Froehlich, S. Shirinzadeh, R. Drechsler
Resistive Random Access Memory (ReRAM) is an emerging non-volatile technology with high scalability and zero standby power which allows to perform logic primitives. ReRAM crossbar arrays combined with a CMOS ubstrate provide a wide range of benefits in logic synthesis. In this paper, we propose to exploit ReRAM in sequential circuits as it provides both required features as a computational and memory element. We propose a fully automated synthesis approach based on graph representations (i.e., BDDs and AIGs) for synthesis of sequential circuits on hybrid CMOS-ReRAM architectures. We propose an algorithm to efficiently divide the target function into two independent computational parts. This allows to merge part of the computation within a ReRAM unit and utilize its computational capabilities besides its function as a sequential element in order to minimize the CMOS overhead. Experimental results show that ReRAM allows for a significant reduction in CMOS size of up to 40.9% for BDDs with an average of 8.7% for BDDs and up to 10.1% with an average of 3.2% for AIGs.
电阻式随机存取存储器(ReRAM)是一种新兴的非易失性技术,具有高可扩展性和零待机功率,允许执行逻辑原语。与CMOS衬底相结合的ReRAM交叉棒阵列在逻辑合成中提供了广泛的好处。在本文中,我们建议在顺序电路中利用ReRAM,因为它提供了作为计算和存储元件所需的功能。我们提出了一种基于图形表示(即bdd和AIGs)的全自动合成方法,用于在混合CMOS-ReRAM架构上合成顺序电路。我们提出了一种将目标函数有效地划分为两个独立计算部分的算法。这允许在ReRAM单元内合并部分计算,并利用其作为顺序元件的功能之外的计算能力,以最大限度地减少CMOS开销。实验结果表明,ReRAM可以使bdd的CMOS尺寸显著减小40.9%,bdd的平均尺寸减小8.7%,ag的平均尺寸减小10.1%,平均尺寸减小3.2%。
{"title":"Logic Synthesis for Hybrid CMOS-ReRAM Sequential Circuits","authors":"Saman Froehlich, S. Shirinzadeh, R. Drechsler","doi":"10.1109/ISVLSI.2019.00084","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00084","url":null,"abstract":"Resistive Random Access Memory (ReRAM) is an emerging non-volatile technology with high scalability and zero standby power which allows to perform logic primitives. ReRAM crossbar arrays combined with a CMOS ubstrate provide a wide range of benefits in logic synthesis. In this paper, we propose to exploit ReRAM in sequential circuits as it provides both required features as a computational and memory element. We propose a fully automated synthesis approach based on graph representations (i.e., BDDs and AIGs) for synthesis of sequential circuits on hybrid CMOS-ReRAM architectures. We propose an algorithm to efficiently divide the target function into two independent computational parts. This allows to merge part of the computation within a ReRAM unit and utilize its computational capabilities besides its function as a sequential element in order to minimize the CMOS overhead. Experimental results show that ReRAM allows for a significant reduction in CMOS size of up to 40.9% for BDDs with an average of 8.7% for BDDs and up to 10.1% with an average of 3.2% for AIGs.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"1 1","pages":"431-436"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81075587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Securing a Wireless Network-on-Chip Against Jamming Based Denial-of-Service Attacks 保护无线片上网络免受基于干扰的拒绝服务攻击
Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00065
Abhishek Vashist, Andrew Keats, Sai Manoj Pudukotai Dinakarrao, A. Ganguly
Wireless Networks-on-Chips (NoCs) have emerged as a panacea to the non-scalable multi-hop data transmission paths in traditional wired NoC architectures. Using low-power transceivers in NoC switches, novel Wireless NoC (WiNoC) architectures have been shown to achieve higher energy efficiency with improved peak bandwidth and reduced on-chip data transfer latency. However, using wireless interconnects for data transfer within a chip makes the on-chip communications vulnerable to various security threats from either external attackers or internal hardware Trojans (HTs). In this work, we propose a mechanism to make the wireless communication in a WiNoC secure against persistent jamming based Denial-of-Service attacks from both external and internal attackers. Persistent jamming attacks on the on-chip wireless medium will cause interference in data transfer over the duration of the attack resulting in errors in contiguous bits, known as burst errors. Therefore, we use a burst error correction code to monitor the rate of burst errors received over the wireless medium and deploy a Machine Learning (ML) classifier to detect the persistent jamming attack and distinguish it from random burst errors. In the event of jamming attack, alternate routing strategies are proposed to avoid the DoS attack over the wireless medium, so that a secure data transfer can be sustained even in the presence of jamming. We evaluate the proposed technique on a secure WiNoC in the presence of DoS attacks. It has been observed that with the proposed defense mechanisms, WiNoC can outperform a wired NoC even in presence of attacks in terms of performance and security. On an average, 99.87% attack detection was achieved with the chosen ML Classifiers. A bandwidth degradation of <3% is experienced in the event of internal attack, while the wireless interconnects are disabled in the presence of an external attacker.
无线片上网络(NoC)已经成为解决传统有线NoC架构中不可扩展的多跳数据传输路径的灵丹妙药。在NoC交换机中使用低功耗收发器,新型无线NoC (WiNoC)架构已被证明可以通过提高峰值带宽和减少片上数据传输延迟来实现更高的能源效率。然而,在芯片内使用无线互连进行数据传输使得芯片上的通信容易受到来自外部攻击者或内部硬件木马(ht)的各种安全威胁。在这项工作中,我们提出了一种机制,使WiNoC中的无线通信免受来自外部和内部攻击者基于拒绝服务攻击的持续干扰。对片上无线介质的持续干扰攻击将在攻击期间对数据传输造成干扰,从而导致连续位的错误,即所谓的突发错误。因此,我们使用突发纠错码来监控无线媒体接收突发错误的比率,并部署机器学习(ML)分类器来检测持续干扰攻击并将其与随机突发错误区分开来。在发生干扰攻击的情况下,提出了替代路由策略以避免无线介质上的DoS攻击,从而在存在干扰的情况下也能保证数据的安全传输。我们在存在DoS攻击的安全WiNoC上评估了所提出的技术。据观察,使用提出的防御机制,WiNoC即使在性能和安全性方面存在攻击,也可以优于有线NoC。平均而言,使用所选择的ML分类器可以实现99.87%的攻击检测。在发生内部攻击时,带宽下降<3%,而在存在外部攻击者时,无线互连被禁用。
{"title":"Securing a Wireless Network-on-Chip Against Jamming Based Denial-of-Service Attacks","authors":"Abhishek Vashist, Andrew Keats, Sai Manoj Pudukotai Dinakarrao, A. Ganguly","doi":"10.1109/ISVLSI.2019.00065","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00065","url":null,"abstract":"Wireless Networks-on-Chips (NoCs) have emerged as a panacea to the non-scalable multi-hop data transmission paths in traditional wired NoC architectures. Using low-power transceivers in NoC switches, novel Wireless NoC (WiNoC) architectures have been shown to achieve higher energy efficiency with improved peak bandwidth and reduced on-chip data transfer latency. However, using wireless interconnects for data transfer within a chip makes the on-chip communications vulnerable to various security threats from either external attackers or internal hardware Trojans (HTs). In this work, we propose a mechanism to make the wireless communication in a WiNoC secure against persistent jamming based Denial-of-Service attacks from both external and internal attackers. Persistent jamming attacks on the on-chip wireless medium will cause interference in data transfer over the duration of the attack resulting in errors in contiguous bits, known as burst errors. Therefore, we use a burst error correction code to monitor the rate of burst errors received over the wireless medium and deploy a Machine Learning (ML) classifier to detect the persistent jamming attack and distinguish it from random burst errors. In the event of jamming attack, alternate routing strategies are proposed to avoid the DoS attack over the wireless medium, so that a secure data transfer can be sustained even in the presence of jamming. We evaluate the proposed technique on a secure WiNoC in the presence of DoS attacks. It has been observed that with the proposed defense mechanisms, WiNoC can outperform a wired NoC even in presence of attacks in terms of performance and security. On an average, 99.87% attack detection was achieved with the chosen ML Classifiers. A bandwidth degradation of <3% is experienced in the event of internal attack, while the wireless interconnects are disabled in the presence of an external attacker.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"33 1","pages":"320-325"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80432429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Energy-Efficient Embedded Inference of SVMs on FPGA 基于FPGA的支持向量机节能嵌入式推理
Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00038
O. Elgawi, A. Mutawa, Afaq Ahmad
We propose an energy-efficient embedded binarized Support Vector Machine (eBSVM) architecture and present its implementation on low-power FPGA accelerator. With binarized input activations and output weights, the dot product operation (float-point multiplications and additions) can be replaced by bitwise XNOR and popcount operations, respectively. The proposed accelerator computes the two binarized vectors using hamming weights, resulting in reduced execution time and energy consumption. Evaluation results show that eBSVM demonstrates performance and performance-per-Watt on MNIST and CIFAR-10 datasets compared to its fixed point (FP) counterpart implemented in CPU and GPU with small accuracy degradation.
提出了一种节能的嵌入式二值化支持向量机(eBSVM)架构,并给出了其在低功耗FPGA加速器上的实现。使用二进制化的输入激活和输出权重,点积操作(浮点乘法和加法)可以分别用按位的XNOR和popcount操作代替。该算法利用汉明权值计算两个二值化向量,减少了执行时间和能量消耗。评估结果表明,与在CPU和GPU上实现的定点支持向量机相比,eBSVM在MNIST和CIFAR-10数据集上具有良好的性能和每瓦特性能,精度下降较小。
{"title":"Energy-Efficient Embedded Inference of SVMs on FPGA","authors":"O. Elgawi, A. Mutawa, Afaq Ahmad","doi":"10.1109/ISVLSI.2019.00038","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00038","url":null,"abstract":"We propose an energy-efficient embedded binarized Support Vector Machine (eBSVM) architecture and present its implementation on low-power FPGA accelerator. With binarized input activations and output weights, the dot product operation (float-point multiplications and additions) can be replaced by bitwise XNOR and popcount operations, respectively. The proposed accelerator computes the two binarized vectors using hamming weights, resulting in reduced execution time and energy consumption. Evaluation results show that eBSVM demonstrates performance and performance-per-Watt on MNIST and CIFAR-10 datasets compared to its fixed point (FP) counterpart implemented in CPU and GPU with small accuracy degradation.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"64 12","pages":"164-168"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91465891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Novel Single/Double Precision Normalized IEEE 754 Floating-Point Adder/Subtracter 一种新的单/双精度归一化IEEE 754浮点加/减器
Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00058
Brett Mathis, J. Stine
This paper demonstrates the design of a fully IEEE 754-compliant floating-point adder and subtractor. This design focuses on creating a high-speed, low-power design while still adhering completely to the IEEE 754 standard. This design's novelty comes in the form of it's 64-bit prefix adder structure, and the parallelization of it's subcomponents. The adder/subtractor has full support for 32-bit and 64-bit operands, as well as the ability to convert integer operands to the IEEE 754 standard. Synthesis results presented use a cmos32soi 32nm CMOS technology and ARM standard-cells.
本文演示了一个完全符合IEEE 754标准的浮点加减法器的设计。本设计的重点是创建一个高速,低功耗的设计,同时仍然完全遵守IEEE 754标准。这种设计的新颖之处在于它的64位前缀加法器结构,以及它的子组件的并行化。加/减法器完全支持32位和64位操作数,以及将整数操作数转换为IEEE 754标准的能力。合成结果采用了cmos32soi 32nm CMOS技术和ARM标准单元。
{"title":"A Novel Single/Double Precision Normalized IEEE 754 Floating-Point Adder/Subtracter","authors":"Brett Mathis, J. Stine","doi":"10.1109/ISVLSI.2019.00058","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00058","url":null,"abstract":"This paper demonstrates the design of a fully IEEE 754-compliant floating-point adder and subtractor. This design focuses on creating a high-speed, low-power design while still adhering completely to the IEEE 754 standard. This design's novelty comes in the form of it's 64-bit prefix adder structure, and the parallelization of it's subcomponents. The adder/subtractor has full support for 32-bit and 64-bit operands, as well as the ability to convert integer operands to the IEEE 754 standard. Synthesis results presented use a cmos32soi 32nm CMOS technology and ARM standard-cells.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"41 1","pages":"278-283"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85011373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
When Neural Architecture Search Meets Hardware Implementation: from Hardware Awareness to Co-Design 当神经架构搜索遇到硬件实现:从硬件感知到协同设计
Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00014
Xinyi Zhang, Weiwen Jiang, Yiyu Shi, J. Hu
Neural Architecture Search (NAS), that automatically identifies the best network architecture, is a promising technique to respond to the ever-growing demand for application-specific Artificial Intelligence (AI). On the other hand, a large number of research efforts have been put on implementing and optimizing AI applications on the hardware. Out of all leading computation platforms, Field Programmable Gate Arrays (FPGAs) stand out due to its flexibility and versatility over ASICs and its efficiency over CPUs and GPUs. To identify the best neural architecture and hardware implementation pair, a number of research works are emerging to involve the awareness of hardware efficiency in the NAS process, which is called "hardware-aware NAS". Unlike the conventional NAS with a mono-criteria of accuracy, hardware-aware NAS is a multi-objective optimization problem, which aims to identify the best network and hardware pair to maximize accuracy with guaranteed hardware efficiency. Most recently, the co-design of neural architecture and hardware has been put forward to further push forward the Pareto frontier between accuracy and efficiency trade-off. This paper will review and discuss the current progress in the neural architecture search and the implementation on hardware.
自动识别最佳网络架构的神经架构搜索(NAS)是一种有前途的技术,可以响应对特定应用人工智能(AI)日益增长的需求。另一方面,在硬件上实现和优化人工智能应用已经投入了大量的研究工作。在所有领先的计算平台中,现场可编程门阵列(fpga)因其在asic上的灵活性和多功能性以及在cpu和gpu上的效率而脱颖而出。为了确定最佳的神经网络架构和硬件实现对,许多研究工作正在兴起,涉及到NAS过程中硬件效率的意识,这被称为“硬件感知NAS”。与传统NAS的单一精度标准不同,硬件感知NAS是一个多目标优化问题,其目的是在保证硬件效率的情况下,识别最佳的网络和硬件对,以实现最大的精度。最近,人们提出了神经网络架构和硬件的协同设计,以进一步推进精度和效率权衡之间的帕累托边界。本文将回顾和讨论神经结构搜索和硬件实现的最新进展。
{"title":"When Neural Architecture Search Meets Hardware Implementation: from Hardware Awareness to Co-Design","authors":"Xinyi Zhang, Weiwen Jiang, Yiyu Shi, J. Hu","doi":"10.1109/ISVLSI.2019.00014","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00014","url":null,"abstract":"Neural Architecture Search (NAS), that automatically identifies the best network architecture, is a promising technique to respond to the ever-growing demand for application-specific Artificial Intelligence (AI). On the other hand, a large number of research efforts have been put on implementing and optimizing AI applications on the hardware. Out of all leading computation platforms, Field Programmable Gate Arrays (FPGAs) stand out due to its flexibility and versatility over ASICs and its efficiency over CPUs and GPUs. To identify the best neural architecture and hardware implementation pair, a number of research works are emerging to involve the awareness of hardware efficiency in the NAS process, which is called \"hardware-aware NAS\". Unlike the conventional NAS with a mono-criteria of accuracy, hardware-aware NAS is a multi-objective optimization problem, which aims to identify the best network and hardware pair to maximize accuracy with guaranteed hardware efficiency. Most recently, the co-design of neural architecture and hardware has been put forward to further push forward the Pareto frontier between accuracy and efficiency trade-off. This paper will review and discuss the current progress in the neural architecture search and the implementation on hardware.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"20 1","pages":"25-30"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89423416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
ASSET: Architectures for Smart Security of Non-Volatile Memories 资产:非易失性存储器的智能安全架构
Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00070
S. Swami, K. Mohanram
Computing systems that integrate advanced non-volatile memories (NVMs) are vulnerable to several security attacks that threaten (i) data confidentiality, (ii) data availability, and (iii) data integrity. This paper proposes Architectures for Smart Security of NVMs (AS-SET), which integrates five low overhead, high performance security solutions—SECRET [1], COVERT [2], ACME [3], ARSE-NAL [4], and STASH [5]—to thwart these attacks on NVM systems. SECRET is a low cost security solution that employs counter mode encryption (CME) for data confidentiality in multi-/triple-level cell (i.e., MLC/TLC) NVMs. COVERT and ACME complement SECRET to improve system availability of CME. ARSENAL integrates CME and Bonsai Merkle Tree (BMT) authentication to thwart data confidentiality and integrity attacks, respectively, in NVMs and simultaneously enables instant data recovery (IDR) on power/system failures. Finally, STASH is the first comprehensive end-to-end security architecture for state-of-the-art smart hybrid memories (SHMs). STASH integrates (i) CME for data confidentiality, (ii) page-level MT authentication for data integrity, (iii) recovery-compatible MT updates to withstand power or system failures, and (iv) page-migration friendly security meta-data management. This paper thus addresses the core security challenges of next-generation NVM systems.
集成高级非易失性存储器(nvm)的计算系统容易受到几种安全攻击的威胁,这些攻击会威胁到(i)数据机密性,(ii)数据可用性和(iii)数据完整性。本文提出了NVM智能安全架构(AS-SET),它集成了五种低开销、高性能的安全解决方案——secret[1]、COVERT[2]、ACME[3]、ase - nal[4]和STASH[5]——以阻止这些对NVM系统的攻击。SECRET是一种低成本的安全解决方案,它采用反模式加密(CME)来实现多/三层单元(即MLC/TLC) nvm中的数据机密性。COVERT和ACME是SECRET的补充,以提高CME的系统可用性。ARSENAL集成了CME和盆景默克尔树(BMT)认证,分别可以防止nvm中的数据机密性和完整性攻击,同时可以在电源/系统故障时实现即时数据恢复(IDR)。最后,STASH是用于最先进的智能混合存储器(SHMs)的第一个全面的端到端安全体系结构。STASH集成了(i)数据保密性的CME, (ii)数据完整性的页面级MT认证,(iii)恢复兼容的MT更新以承受电源或系统故障,以及(iv)页面迁移友好的安全元数据管理。因此,本文解决了下一代NVM系统的核心安全挑战。
{"title":"ASSET: Architectures for Smart Security of Non-Volatile Memories","authors":"S. Swami, K. Mohanram","doi":"10.1109/ISVLSI.2019.00070","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00070","url":null,"abstract":"Computing systems that integrate advanced non-volatile memories (NVMs) are vulnerable to several security attacks that threaten (i) data confidentiality, (ii) data availability, and (iii) data integrity. This paper proposes Architectures for Smart Security of NVMs (AS-SET), which integrates five low overhead, high performance security solutions—SECRET [1], COVERT [2], ACME [3], ARSE-NAL [4], and STASH [5]—to thwart these attacks on NVM systems. SECRET is a low cost security solution that employs counter mode encryption (CME) for data confidentiality in multi-/triple-level cell (i.e., MLC/TLC) NVMs. COVERT and ACME complement SECRET to improve system availability of CME. ARSENAL integrates CME and Bonsai Merkle Tree (BMT) authentication to thwart data confidentiality and integrity attacks, respectively, in NVMs and simultaneously enables instant data recovery (IDR) on power/system failures. Finally, STASH is the first comprehensive end-to-end security architecture for state-of-the-art smart hybrid memories (SHMs). STASH integrates (i) CME for data confidentiality, (ii) page-level MT authentication for data integrity, (iii) recovery-compatible MT updates to withstand power or system failures, and (iv) page-migration friendly security meta-data management. This paper thus addresses the core security challenges of next-generation NVM systems.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"182 1","pages":"348-353"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77589898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1