首页 > 最新文献

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)最新文献

英文 中文
Design of Quantum Circuits for Cryptanalysis and Image Processing Applications 用于密码分析和图像处理应用的量子电路设计
Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00072
Edgard Muñoz-Coreas, H. Thapliyal
Quantum circuits for arithmetic functions over Galois fields such as squaring are required to implement quantum cryptanalysis algorithms. Quantum circuits for integer arithmetic such as multiplication are required to implement scientific computing algorithms and quantum image processing algorithms on quantum computers. Reliable quantum circuits require error correcting codes and gates that are fault tolerant in nature. Quantum circuits of many qubits are challenging to implement making designs with low qubit cost desirable. In this work, we present quantum arithmetic circuits for applications in quantum cryptanalysis and quantum image processing. We present a proposed algorithm for synthesizing gate cost, qubit cost and depth optimized Galois field (GF(2^m)) squaring circuits for quantum cryptanalysis applications. In addition, these squaring circuits are incorporated into a proposed quantum circuit for inversion in GF(2^m). This work also presents a proposed quantum integer conditional addition circuit and a quantum integer multiplication circuit optimized for T-count and qubit cost. The quantum conditional addition circuit and quantum multiplier are incorporated into proposed quantum circuits for bilinear interpolation optimized for T-count cost that can be used in quantum image processing applications.
为了实现量子密码分析算法,伽罗瓦场(如平方)上的算术函数的量子电路是必需的。为了在量子计算机上实现科学计算算法和量子图像处理算法,需要用于乘法等整数运算的量子电路。可靠的量子电路需要纠错码和本质上容错的门。多量子位的量子电路很难实现低量子位成本的设计。在这项工作中,我们提出了用于量子密码分析和量子图像处理的量子算术电路。我们提出了一种用于量子密码分析应用的门成本、量子比特成本和深度优化伽罗瓦场(GF(2^m))平方电路的综合算法。此外,这些平方电路被整合到一个在GF(2^m)中反转的量子电路中。本文还提出了一种针对t计数和量子比特成本优化的量子整数条件加法电路和量子整数乘法电路。将量子条件加法电路和量子乘法器集成到针对t计数成本进行优化的双线性插值量子电路中,可用于量子图像处理应用。
{"title":"Design of Quantum Circuits for Cryptanalysis and Image Processing Applications","authors":"Edgard Muñoz-Coreas, H. Thapliyal","doi":"10.1109/ISVLSI.2019.00072","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00072","url":null,"abstract":"Quantum circuits for arithmetic functions over Galois fields such as squaring are required to implement quantum cryptanalysis algorithms. Quantum circuits for integer arithmetic such as multiplication are required to implement scientific computing algorithms and quantum image processing algorithms on quantum computers. Reliable quantum circuits require error correcting codes and gates that are fault tolerant in nature. Quantum circuits of many qubits are challenging to implement making designs with low qubit cost desirable. In this work, we present quantum arithmetic circuits for applications in quantum cryptanalysis and quantum image processing. We present a proposed algorithm for synthesizing gate cost, qubit cost and depth optimized Galois field (GF(2^m)) squaring circuits for quantum cryptanalysis applications. In addition, these squaring circuits are incorporated into a proposed quantum circuit for inversion in GF(2^m). This work also presents a proposed quantum integer conditional addition circuit and a quantum integer multiplication circuit optimized for T-count and qubit cost. The quantum conditional addition circuit and quantum multiplier are incorporated into proposed quantum circuits for bilinear interpolation optimized for T-count cost that can be used in quantum image processing applications.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"94 1","pages":"360-365"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79606804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Formal Hardware Verification of InfoSec Primitives 信息安全原语的正式硬件验证
Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00034
M. Basiri, S. Shukla
Information Security (InfoSec) plays a major role in the modern real time applications. This paper proposes equivalence check based efficient formal hardware verification schemes for various InfoSec primitives such as 128-bit Advanced Encryption Scheme (AES), Bose-Chaudhuri-Hocquenghem (BCH) encoder, and m-bit GF(p) exponentiator (where p = log2m). The verification of 128-bit AES is done with Artix-7 FPGA using Xilinx Vivado. The verification of BCH encoder and GF(p) exponentiator are done with 45nm CMOS technology using Cadence. The synthesis results show that the proposed hardwaresoftware co-design based 128-bit AES formal hardware verification does not compromise the resource utilization as compared with various existing designs. Similarly, the proposed formal hardware verification of BCH encoder with generator polynomial length 64 and 16-bit GF(p) exponentiator do not compromise the delay as compared with various existing techniques.
信息安全(InfoSec)在现代实时应用中起着重要作用。本文针对各种信息安全原语,如128位高级加密方案(AES)、Bose-Chaudhuri-Hocquenghem (BCH)编码器和m位GF(p)指数(其中p = log2m),提出了基于等价校验的高效形式化硬件验证方案。采用Xilinx Vivado的Artix-7 FPGA对128位AES进行验证。BCH编码器和GF(p)指数器的验证采用45nm CMOS技术。综合结果表明,与现有的各种设计相比,基于128位AES形式硬件验证的软硬件协同设计不会影响资源利用率。同样,与各种现有技术相比,所提出的具有生成器多项式长度为64和16位GF(p)指数的BCH编码器的正式硬件验证不会损害延迟。
{"title":"Formal Hardware Verification of InfoSec Primitives","authors":"M. Basiri, S. Shukla","doi":"10.1109/ISVLSI.2019.00034","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00034","url":null,"abstract":"Information Security (InfoSec) plays a major role in the modern real time applications. This paper proposes equivalence check based efficient formal hardware verification schemes for various InfoSec primitives such as 128-bit Advanced Encryption Scheme (AES), Bose-Chaudhuri-Hocquenghem (BCH) encoder, and m-bit GF(p) exponentiator (where p = log2m). The verification of 128-bit AES is done with Artix-7 FPGA using Xilinx Vivado. The verification of BCH encoder and GF(p) exponentiator are done with 45nm CMOS technology using Cadence. The synthesis results show that the proposed hardwaresoftware co-design based 128-bit AES formal hardware verification does not compromise the resource utilization as compared with various existing designs. Similarly, the proposed formal hardware verification of BCH encoder with generator polynomial length 64 and 16-bit GF(p) exponentiator do not compromise the delay as compared with various existing techniques.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"26 1","pages":"140-145"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84826557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FAST: A Frequency-Aware Skewed Merkle Tree for FPGA-Secured Embedded Systems FAST:用于fpga安全嵌入式系统的频率感知倾斜默克尔树
Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00066
Yu Zou, Mingjie Lin
Protection of external memory is important when an attacker could get physical accesses to the external memory bus. Compared to general-purpose systems, embedded systems are more vulnerable to physical attacks due to the portability. One of the attacks is a replay attack, which an attacker records data sent over a memory bus and replays it to pretend to be an authorized user. Traditionally, the replay attack is protected using a full, balanced Merkle Tree. Focusing on average-case performance and general-purpose systems, traversal and verification of Merkle Tree incur a huge latency overhead to each memory access. In contrast to general-purpose systems, embedded systems are normally application-specific, and program behaviors and memory access patterns are deterministic. Besides that, we also observed that not all memory locations are accessed equally frequently given a program. Based on these two observations, we propose FAST, a Frequency-Aware Skewed merkle Tree for application-specific embedded systems. After profiling a program in a simulation environment without involving any replay attack protection, we get a memory access frequency distribution. Afterward, we design an automatic and systematic approach to generate an application-specific optimal skewed Merkle Tree accordingly. We propose an efficient hardware architecture to accelerate FAST on FPGA, and by experimenting on five real-world benchmarks, our skewed Merkle Tree implementation outperforms baseline which uses a full balanced Merkle Tree by up to 3 times.
当攻击者可以物理访问外部内存总线时,保护外部内存非常重要。与通用系统相比,嵌入式系统由于其可移植性更容易受到物理攻击。其中一种攻击是重放攻击,攻击者记录通过内存总线发送的数据,并将其重放,以假装是授权用户。传统上,使用完整、平衡的默克尔树来保护重放攻击。专注于平均情况下的性能和通用系统,遍历和验证Merkle树会导致每次内存访问的巨大延迟开销。与通用系统相比,嵌入式系统通常是特定于应用程序的,程序行为和内存访问模式是确定的。除此之外,我们还观察到,给定一个程序,并非所有内存位置的访问频率都是相同的。基于这两个观察结果,我们提出了FAST,一种针对特定应用的嵌入式系统的频率感知倾斜默克尔树。在不涉及任何重放攻击保护的模拟环境中对程序进行分析后,我们得到了内存访问频率分布。然后,我们设计了一个自动和系统的方法来生成特定应用的最优倾斜默克尔树。我们提出了一种高效的硬件架构来加速FPGA上的FAST,并且通过在五个实际基准上进行实验,我们的倾斜默克尔树实现比使用完全平衡的默克尔树的基线性能高出3倍。
{"title":"FAST: A Frequency-Aware Skewed Merkle Tree for FPGA-Secured Embedded Systems","authors":"Yu Zou, Mingjie Lin","doi":"10.1109/ISVLSI.2019.00066","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00066","url":null,"abstract":"Protection of external memory is important when an attacker could get physical accesses to the external memory bus. Compared to general-purpose systems, embedded systems are more vulnerable to physical attacks due to the portability. One of the attacks is a replay attack, which an attacker records data sent over a memory bus and replays it to pretend to be an authorized user. Traditionally, the replay attack is protected using a full, balanced Merkle Tree. Focusing on average-case performance and general-purpose systems, traversal and verification of Merkle Tree incur a huge latency overhead to each memory access. In contrast to general-purpose systems, embedded systems are normally application-specific, and program behaviors and memory access patterns are deterministic. Besides that, we also observed that not all memory locations are accessed equally frequently given a program. Based on these two observations, we propose FAST, a Frequency-Aware Skewed merkle Tree for application-specific embedded systems. After profiling a program in a simulation environment without involving any replay attack protection, we get a memory access frequency distribution. Afterward, we design an automatic and systematic approach to generate an application-specific optimal skewed Merkle Tree accordingly. We propose an efficient hardware architecture to accelerate FAST on FPGA, and by experimenting on five real-world benchmarks, our skewed Merkle Tree implementation outperforms baseline which uses a full balanced Merkle Tree by up to 3 times.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"75 1","pages":"326-331"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85895988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Title Page i 第1页
Pub Date : 2019-07-01 DOI: 10.1109/isvlsi.2019.00001
{"title":"Title Page i","authors":"","doi":"10.1109/isvlsi.2019.00001","DOIUrl":"https://doi.org/10.1109/isvlsi.2019.00001","url":null,"abstract":"","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88308482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Reconfigurable Layered-Based Bio-Inspired Smart Image Sensor 一种可重构分层生物智能图像传感器
Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00039
Pankaj Bhowmik, Md Jubaer Hossain Pantho, S. Saha, C. Bobda
This paper presents a hardware architecture to extract features from an image using the concepts of bio-inspired computing and a method of converting sequential image processing to parallel computational processing units that can execute on the sensor. These computational units are oriented on vertically integrated hierarchical planes and enabled with a region based Attention Module which separates the Regions of Interest (ROIs) from the image. In each layer, the computational units work in parallel and introduce massive parallelism at the pixel level. At the same time, the design saves dynamic power by dynamically enabling and disabling the computational units which ensure high-performance and high-throughput. Moreover, the units are made reconfigurable to support a wide range of machine vision applications by forming a basic structure that is common to all operations and reconfigurable parts for a specific application. Our simulation result shows the design achieves 4.852X power savings on ROIs while processing at 465 Kfps with 800 MHz clock frequency.
本文提出了一种硬件架构,利用生物启发计算的概念从图像中提取特征,并提出了一种将顺序图像处理转换为可在传感器上执行的并行计算处理单元的方法。这些计算单元面向垂直集成的层次平面,并启用基于区域的注意力模块,该模块将感兴趣的区域(roi)从图像中分离出来。在每一层中,计算单元并行工作,并在像素级引入大量并行性。同时,该设计通过动态启用和禁用计算单元来节省动态功耗,确保高性能和高吞吐量。此外,这些单元是可重构的,通过形成一个对所有操作和特定应用的可重构部件通用的基本结构来支持广泛的机器视觉应用。我们的仿真结果表明,该设计在800 MHz时钟频率下以465 Kfps处理时,在roi上节省了4.85倍的功耗。
{"title":"A Reconfigurable Layered-Based Bio-Inspired Smart Image Sensor","authors":"Pankaj Bhowmik, Md Jubaer Hossain Pantho, S. Saha, C. Bobda","doi":"10.1109/ISVLSI.2019.00039","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00039","url":null,"abstract":"This paper presents a hardware architecture to extract features from an image using the concepts of bio-inspired computing and a method of converting sequential image processing to parallel computational processing units that can execute on the sensor. These computational units are oriented on vertically integrated hierarchical planes and enabled with a region based Attention Module which separates the Regions of Interest (ROIs) from the image. In each layer, the computational units work in parallel and introduce massive parallelism at the pixel level. At the same time, the design saves dynamic power by dynamically enabling and disabling the computational units which ensure high-performance and high-throughput. Moreover, the units are made reconfigurable to support a wide range of machine vision applications by forming a basic structure that is common to all operations and reconfigurable parts for a specific application. Our simulation result shows the design achieves 4.852X power savings on ROIs while processing at 465 Kfps with 800 MHz clock frequency.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"2014 1","pages":"169-174"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86646886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tackling the Drawbacks of a Lagrangian Relaxation Based Discrete Gate Sizing Algorithm 解决基于拉格朗日松弛的离散门尺寸算法的缺陷
Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00059
Henrique Placido, R. Reis
The Lagrangian relaxation (LR) based gate sizer proposed in [1] has the best leakage power results published so far for the ISPD 2012 Gate Sizing Contest benchmarks. However, it requires many LR iterations and does not rely on any technique to perform cell option candidate filtering in the LR subproblem solver. Therefore, this paper presents some extensions to address these drawbacks. In order to reduce the number of LR iterations, we propose some enhancements to the original LR multiplier formula. We also use a scaling factor to properly scale timing cost and leakage power in the LR local cost. Moreover, we apply a cell option candidate filtering strategy to reduce the runtime of each LR iteration. Finally, we improve the post-processing timing recovery and power recovery. Our work achieved leakage power results very close to the original algorithm, taking 4.28x fewer LR iterations, on average, and 9.11x fewer cell swaps during LR, on average.
在b[1]中提出的基于拉格朗日弛豫(LR)的栅极尺寸器在ISPD 2012栅极尺寸竞赛基准测试中具有迄今为止公布的最佳泄漏功率结果。然而,它需要许多LR迭代,并且不依赖于任何技术来在LR子问题求解器中执行单元格选项候选过滤。因此,本文提出了一些扩展来解决这些缺点。为了减少LR迭代的次数,我们对原始LR乘数公式进行了一些改进。我们还使用比例因子来适当地缩放LR局部成本中的定时成本和泄漏功率。此外,我们采用单元选项候选过滤策略来减少每次LR迭代的运行时间。最后,对后处理时间恢复和功率恢复进行了改进。我们的工作获得了与原始算法非常接近的泄漏功率结果,平均减少了4.28倍的LR迭代,平均减少了9.11倍的LR期间的电池交换。
{"title":"Tackling the Drawbacks of a Lagrangian Relaxation Based Discrete Gate Sizing Algorithm","authors":"Henrique Placido, R. Reis","doi":"10.1109/ISVLSI.2019.00059","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00059","url":null,"abstract":"The Lagrangian relaxation (LR) based gate sizer proposed in [1] has the best leakage power results published so far for the ISPD 2012 Gate Sizing Contest benchmarks. However, it requires many LR iterations and does not rely on any technique to perform cell option candidate filtering in the LR subproblem solver. Therefore, this paper presents some extensions to address these drawbacks. In order to reduce the number of LR iterations, we propose some enhancements to the original LR multiplier formula. We also use a scaling factor to properly scale timing cost and leakage power in the LR local cost. Moreover, we apply a cell option candidate filtering strategy to reduce the runtime of each LR iteration. Finally, we improve the post-processing timing recovery and power recovery. Our work achieved leakage power results very close to the original algorithm, taking 4.28x fewer LR iterations, on average, and 9.11x fewer cell swaps during LR, on average.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"97 1","pages":"284-289"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83739845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mitigating Reverse Engineering Attacks on Deep Neural Networks 减轻对深度神经网络的逆向工程攻击
Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00122
Yuntao Liu, D. Dachman-Soled, Ankur Srivastava
With the structure of deep neural networks (DNN) being of increasing commercial value, DNN reverse engineering attacks have become a great security concern. It has been shown that the memory access pattern of a processor running DNNs can be exploited to decipher their detailed structure. In this work, we propose a defensive memory access mechanism which utilizes oblivious shuffle, address space layout randomization, and dummy memory accesses to counter such attacks. Experiments show that our defense exponentially increases the attack complexity with asymptotically lower memory access overhead compared to generic memory obfuscation techniques such as ORAM and is scalable to larger DNNs.
随着深度神经网络(deep neural network, DNN)结构的商业价值越来越高,DNN逆向工程攻击已成为人们关注的一大安全问题。研究表明,运行深度神经网络的处理器的内存访问模式可以用来破译它们的详细结构。在这项工作中,我们提出了一种防御性内存访问机制,该机制利用无关洗牌,地址空间布局随机化和虚拟内存访问来对抗此类攻击。实验表明,与一般的内存混淆技术(如ORAM)相比,我们的防御以指数方式增加了攻击复杂性,并且内存访问开销渐近降低,并且可扩展到更大的dnn。
{"title":"Mitigating Reverse Engineering Attacks on Deep Neural Networks","authors":"Yuntao Liu, D. Dachman-Soled, Ankur Srivastava","doi":"10.1109/ISVLSI.2019.00122","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00122","url":null,"abstract":"With the structure of deep neural networks (DNN) being of increasing commercial value, DNN reverse engineering attacks have become a great security concern. It has been shown that the memory access pattern of a processor running DNNs can be exploited to decipher their detailed structure. In this work, we propose a defensive memory access mechanism which utilizes oblivious shuffle, address space layout randomization, and dummy memory accesses to counter such attacks. Experiments show that our defense exponentially increases the attack complexity with asymptotically lower memory access overhead compared to generic memory obfuscation techniques such as ORAM and is scalable to larger DNNs.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"10 1","pages":"657-662"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81995413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
CSrram: Area-Efficient Low-Power Ex-Situ Training Framework for Memristive Neuromorphic Circuits Based on Clustered Sparsity 基于聚类稀疏性的记忆神经形态电路区域高效低功耗非原位训练框架
Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00090
A. Fayyazi, Souvik Kundu, Shahin Nazarian, P. Beerel, Massoud Pedram
Artificial Neural Networks (ANNs) play a key role in many machine learning (ML) applications but poses arduous challenges in terms of storage and computation of network parameters. Memristive crossbar arrays (MCAs) are capable of both computation and storage, making them promising for in-memory computing enabled neural network accelerators. At the same time, the presence of a significant amount of zero weights in ANNs has motivated research in a variety of parameter reduction techniques. However, for crossbar based architectures, the study of efficient methods to take advantage of network sparsity is still in the early stage. This paper presents CSrram, an efficient ex-situ training framework for hybrid CMOS-memristive neuromorphic circuits. CSrram includes a pre-defined block diagonal clustered (BDC) sparsity algorithm to significantly reduce area and power consumption. The proposed framework is verified on a wide range of datasets including MNIST handwritten recognition, fashion MNIST, breast cancer prediction (BCW), IRIS, and mobile health monitoring. Compared to state of the art fully connected memristive neuromorphic circuits, our CSrram with only 25% density of weights in the first junction, provides a power and area efficiency of 1.5x and 2.6x (averaged over five datasets), respectively, without any significant test accuracy loss.
人工神经网络(ann)在许多机器学习(ML)应用中发挥着关键作用,但在网络参数的存储和计算方面面临着艰巨的挑战。记忆交叉棒阵列(MCAs)具有计算和存储能力,使其成为具有内存计算能力的神经网络加速器。与此同时,人工神经网络中大量零权的存在激发了各种参数约简技术的研究。然而,对于基于交叉栏的体系结构,利用网络稀疏性的有效方法的研究仍处于早期阶段。本文提出了一种有效的cmos -记忆神经形态混合电路的非原位训练框架CSrram。CSrram包含预定义的块对角聚类(BDC)稀疏算法,可显着减少面积和功耗。提出的框架在广泛的数据集上进行了验证,包括MNIST手写识别、时尚MNIST、乳腺癌预测(BCW)、IRIS和移动健康监测。与目前最先进的全连接记忆神经形态电路相比,我们的CSrram在第一个结中只有25%的重量密度,分别提供1.5倍和2.6倍的功率和面积效率(在五个数据集上平均),没有任何明显的测试精度损失。
{"title":"CSrram: Area-Efficient Low-Power Ex-Situ Training Framework for Memristive Neuromorphic Circuits Based on Clustered Sparsity","authors":"A. Fayyazi, Souvik Kundu, Shahin Nazarian, P. Beerel, Massoud Pedram","doi":"10.1109/ISVLSI.2019.00090","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00090","url":null,"abstract":"Artificial Neural Networks (ANNs) play a key role in many machine learning (ML) applications but poses arduous challenges in terms of storage and computation of network parameters. Memristive crossbar arrays (MCAs) are capable of both computation and storage, making them promising for in-memory computing enabled neural network accelerators. At the same time, the presence of a significant amount of zero weights in ANNs has motivated research in a variety of parameter reduction techniques. However, for crossbar based architectures, the study of efficient methods to take advantage of network sparsity is still in the early stage. This paper presents CSrram, an efficient ex-situ training framework for hybrid CMOS-memristive neuromorphic circuits. CSrram includes a pre-defined block diagonal clustered (BDC) sparsity algorithm to significantly reduce area and power consumption. The proposed framework is verified on a wide range of datasets including MNIST handwritten recognition, fashion MNIST, breast cancer prediction (BCW), IRIS, and mobile health monitoring. Compared to state of the art fully connected memristive neuromorphic circuits, our CSrram with only 25% density of weights in the first junction, provides a power and area efficiency of 1.5x and 2.6x (averaged over five datasets), respectively, without any significant test accuracy loss.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"250 1","pages":"465-470"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75758584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
An ESL Environment for Modeling Electrical Interconnect Faults 电气互连故障建模的ESL环境
Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00024
Nooshin Nosrati, Katayoon Basharkhah, Rezgar Sadeghi, Z. Navabi
This paper focuses on an ESL integrated environment for modeling communication channels at an abstract level and providing a mechanism for insertion of interconnect electrical faults into the channels for coverage analysis. The channels are designed for abstract initiator-target communications and have a general format that contains properties found in SystemC, TLM-1 and TLM-2.0 channels. This paper presents a relatively complex SystemC channel and shows how our suggested mechanism for crosstalk fault modeling can be inserted into the communication lines of the channel. Crosstalk models examined here are 1) at an abstract aggressor-victim level described by programming a SystemC channel, and 2) at the electrical level using SystemC-AMS. Results show correspondence of crosstalk faults at the two levels, and at the same time much faster simulations for the former.
本文重点研究了一个ESL集成环境,用于在抽象层面上对通信通道进行建模,并提供了一种将互连电气故障插入通道以进行覆盖分析的机制。这些通道是为抽象的启动器-目标通信而设计的,并且具有包含SystemC、TLM-1和TLM-2.0通道中的属性的通用格式。本文提出了一个相对复杂的SystemC信道,并展示了如何将我们提出的串扰故障建模机制插入到信道的通信线路中。这里研究的串扰模型是1)通过编程SystemC通道描述的抽象攻击者-受害者级别,以及2)使用SystemC- ams的电气级别。结果表明,在两个层次上串扰故障是对应的,同时前者的模拟速度要快得多。
{"title":"An ESL Environment for Modeling Electrical Interconnect Faults","authors":"Nooshin Nosrati, Katayoon Basharkhah, Rezgar Sadeghi, Z. Navabi","doi":"10.1109/ISVLSI.2019.00024","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00024","url":null,"abstract":"This paper focuses on an ESL integrated environment for modeling communication channels at an abstract level and providing a mechanism for insertion of interconnect electrical faults into the channels for coverage analysis. The channels are designed for abstract initiator-target communications and have a general format that contains properties found in SystemC, TLM-1 and TLM-2.0 channels. This paper presents a relatively complex SystemC channel and shows how our suggested mechanism for crosstalk fault modeling can be inserted into the communication lines of the channel. Crosstalk models examined here are 1) at an abstract aggressor-victim level described by programming a SystemC channel, and 2) at the electrical level using SystemC-AMS. Results show correspondence of crosstalk faults at the two levels, and at the same time much faster simulations for the former.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"20 1","pages":"88-93"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75788048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Computationally Efficient Learning of Quality Controlled Word Embeddings for Natural Language Processing 自然语言处理中质量控制词嵌入的计算效率学习
Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00033
M. Alawad, G. Tourassi
Deep learning (DL) has been used for many natural language processing (NLP) tasks due to its superior performance as compared to traditional machine learning approaches. In DL models for NLP, words are represented using word embeddings, which capture both semantic and syntactic information in text. However, 90-95% of the DL trainable parameters are associated with the word embeddings, resulting in a large storage or memory footprint. Therefore, reducing the number of word embedding parameters is critical, especially with the increase of vocabulary size. In this work, we propose a novel approximate word embeddings approach for convolutional neural networks (CNNs) used for text classification tasks. The proposed approach significantly reduces the number of model trainable parameters without noticeably sacrificing in computing performance accuracy. Compared to other techniques, our proposed word embeddings technique does not require modifications to the DL model architecture. We evaluate the performance of the the proposed word embeddings on three classification tasks using two datasets, composed of Yelp and Amazon reviews. The results show that the proposed method can reduce the number of word embeddings parameters by 98% and 99% for the Yelp and Amazon datasets respectively, with no drop in computing accuracy.
由于与传统机器学习方法相比,深度学习(DL)具有优越的性能,因此已被用于许多自然语言处理(NLP)任务。在NLP的深度学习模型中,单词使用词嵌入来表示,它捕获文本中的语义和句法信息。然而,90-95%的深度学习可训练参数与词嵌入相关,导致大量存储或内存占用。因此,减少词嵌入参数的数量至关重要,尤其是随着词汇量的增加。在这项工作中,我们为用于文本分类任务的卷积神经网络(cnn)提出了一种新的近似词嵌入方法。该方法在不显著牺牲计算性能精度的前提下,显著减少了模型可训练参数的数量。与其他技术相比,我们提出的词嵌入技术不需要修改深度学习模型架构。我们使用由Yelp和Amazon评论组成的两个数据集来评估所提出的词嵌入在三个分类任务上的性能。结果表明,该方法在不影响计算精度的情况下,可以将Yelp和Amazon数据集的词嵌入参数数量分别减少98%和99%。
{"title":"Computationally Efficient Learning of Quality Controlled Word Embeddings for Natural Language Processing","authors":"M. Alawad, G. Tourassi","doi":"10.1109/ISVLSI.2019.00033","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00033","url":null,"abstract":"Deep learning (DL) has been used for many natural language processing (NLP) tasks due to its superior performance as compared to traditional machine learning approaches. In DL models for NLP, words are represented using word embeddings, which capture both semantic and syntactic information in text. However, 90-95% of the DL trainable parameters are associated with the word embeddings, resulting in a large storage or memory footprint. Therefore, reducing the number of word embedding parameters is critical, especially with the increase of vocabulary size. In this work, we propose a novel approximate word embeddings approach for convolutional neural networks (CNNs) used for text classification tasks. The proposed approach significantly reduces the number of model trainable parameters without noticeably sacrificing in computing performance accuracy. Compared to other techniques, our proposed word embeddings technique does not require modifications to the DL model architecture. We evaluate the performance of the the proposed word embeddings on three classification tasks using two datasets, composed of Yelp and Amazon reviews. The results show that the proposed method can reduce the number of word embeddings parameters by 98% and 99% for the Yelp and Amazon datasets respectively, with no drop in computing accuracy.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"9 1","pages":"134-139"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72712078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1