Pub Date : 2019-07-01DOI: 10.1109/ISVLSI.2019.00072
Edgard Muñoz-Coreas, H. Thapliyal
Quantum circuits for arithmetic functions over Galois fields such as squaring are required to implement quantum cryptanalysis algorithms. Quantum circuits for integer arithmetic such as multiplication are required to implement scientific computing algorithms and quantum image processing algorithms on quantum computers. Reliable quantum circuits require error correcting codes and gates that are fault tolerant in nature. Quantum circuits of many qubits are challenging to implement making designs with low qubit cost desirable. In this work, we present quantum arithmetic circuits for applications in quantum cryptanalysis and quantum image processing. We present a proposed algorithm for synthesizing gate cost, qubit cost and depth optimized Galois field (GF(2^m)) squaring circuits for quantum cryptanalysis applications. In addition, these squaring circuits are incorporated into a proposed quantum circuit for inversion in GF(2^m). This work also presents a proposed quantum integer conditional addition circuit and a quantum integer multiplication circuit optimized for T-count and qubit cost. The quantum conditional addition circuit and quantum multiplier are incorporated into proposed quantum circuits for bilinear interpolation optimized for T-count cost that can be used in quantum image processing applications.
{"title":"Design of Quantum Circuits for Cryptanalysis and Image Processing Applications","authors":"Edgard Muñoz-Coreas, H. Thapliyal","doi":"10.1109/ISVLSI.2019.00072","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00072","url":null,"abstract":"Quantum circuits for arithmetic functions over Galois fields such as squaring are required to implement quantum cryptanalysis algorithms. Quantum circuits for integer arithmetic such as multiplication are required to implement scientific computing algorithms and quantum image processing algorithms on quantum computers. Reliable quantum circuits require error correcting codes and gates that are fault tolerant in nature. Quantum circuits of many qubits are challenging to implement making designs with low qubit cost desirable. In this work, we present quantum arithmetic circuits for applications in quantum cryptanalysis and quantum image processing. We present a proposed algorithm for synthesizing gate cost, qubit cost and depth optimized Galois field (GF(2^m)) squaring circuits for quantum cryptanalysis applications. In addition, these squaring circuits are incorporated into a proposed quantum circuit for inversion in GF(2^m). This work also presents a proposed quantum integer conditional addition circuit and a quantum integer multiplication circuit optimized for T-count and qubit cost. The quantum conditional addition circuit and quantum multiplier are incorporated into proposed quantum circuits for bilinear interpolation optimized for T-count cost that can be used in quantum image processing applications.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"94 1","pages":"360-365"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79606804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/ISVLSI.2019.00034
M. Basiri, S. Shukla
Information Security (InfoSec) plays a major role in the modern real time applications. This paper proposes equivalence check based efficient formal hardware verification schemes for various InfoSec primitives such as 128-bit Advanced Encryption Scheme (AES), Bose-Chaudhuri-Hocquenghem (BCH) encoder, and m-bit GF(p) exponentiator (where p = log2m). The verification of 128-bit AES is done with Artix-7 FPGA using Xilinx Vivado. The verification of BCH encoder and GF(p) exponentiator are done with 45nm CMOS technology using Cadence. The synthesis results show that the proposed hardwaresoftware co-design based 128-bit AES formal hardware verification does not compromise the resource utilization as compared with various existing designs. Similarly, the proposed formal hardware verification of BCH encoder with generator polynomial length 64 and 16-bit GF(p) exponentiator do not compromise the delay as compared with various existing techniques.
{"title":"Formal Hardware Verification of InfoSec Primitives","authors":"M. Basiri, S. Shukla","doi":"10.1109/ISVLSI.2019.00034","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00034","url":null,"abstract":"Information Security (InfoSec) plays a major role in the modern real time applications. This paper proposes equivalence check based efficient formal hardware verification schemes for various InfoSec primitives such as 128-bit Advanced Encryption Scheme (AES), Bose-Chaudhuri-Hocquenghem (BCH) encoder, and m-bit GF(p) exponentiator (where p = log2m). The verification of 128-bit AES is done with Artix-7 FPGA using Xilinx Vivado. The verification of BCH encoder and GF(p) exponentiator are done with 45nm CMOS technology using Cadence. The synthesis results show that the proposed hardwaresoftware co-design based 128-bit AES formal hardware verification does not compromise the resource utilization as compared with various existing designs. Similarly, the proposed formal hardware verification of BCH encoder with generator polynomial length 64 and 16-bit GF(p) exponentiator do not compromise the delay as compared with various existing techniques.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"26 1","pages":"140-145"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84826557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/ISVLSI.2019.00066
Yu Zou, Mingjie Lin
Protection of external memory is important when an attacker could get physical accesses to the external memory bus. Compared to general-purpose systems, embedded systems are more vulnerable to physical attacks due to the portability. One of the attacks is a replay attack, which an attacker records data sent over a memory bus and replays it to pretend to be an authorized user. Traditionally, the replay attack is protected using a full, balanced Merkle Tree. Focusing on average-case performance and general-purpose systems, traversal and verification of Merkle Tree incur a huge latency overhead to each memory access. In contrast to general-purpose systems, embedded systems are normally application-specific, and program behaviors and memory access patterns are deterministic. Besides that, we also observed that not all memory locations are accessed equally frequently given a program. Based on these two observations, we propose FAST, a Frequency-Aware Skewed merkle Tree for application-specific embedded systems. After profiling a program in a simulation environment without involving any replay attack protection, we get a memory access frequency distribution. Afterward, we design an automatic and systematic approach to generate an application-specific optimal skewed Merkle Tree accordingly. We propose an efficient hardware architecture to accelerate FAST on FPGA, and by experimenting on five real-world benchmarks, our skewed Merkle Tree implementation outperforms baseline which uses a full balanced Merkle Tree by up to 3 times.
{"title":"FAST: A Frequency-Aware Skewed Merkle Tree for FPGA-Secured Embedded Systems","authors":"Yu Zou, Mingjie Lin","doi":"10.1109/ISVLSI.2019.00066","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00066","url":null,"abstract":"Protection of external memory is important when an attacker could get physical accesses to the external memory bus. Compared to general-purpose systems, embedded systems are more vulnerable to physical attacks due to the portability. One of the attacks is a replay attack, which an attacker records data sent over a memory bus and replays it to pretend to be an authorized user. Traditionally, the replay attack is protected using a full, balanced Merkle Tree. Focusing on average-case performance and general-purpose systems, traversal and verification of Merkle Tree incur a huge latency overhead to each memory access. In contrast to general-purpose systems, embedded systems are normally application-specific, and program behaviors and memory access patterns are deterministic. Besides that, we also observed that not all memory locations are accessed equally frequently given a program. Based on these two observations, we propose FAST, a Frequency-Aware Skewed merkle Tree for application-specific embedded systems. After profiling a program in a simulation environment without involving any replay attack protection, we get a memory access frequency distribution. Afterward, we design an automatic and systematic approach to generate an application-specific optimal skewed Merkle Tree accordingly. We propose an efficient hardware architecture to accelerate FAST on FPGA, and by experimenting on five real-world benchmarks, our skewed Merkle Tree implementation outperforms baseline which uses a full balanced Merkle Tree by up to 3 times.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"75 1","pages":"326-331"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85895988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/ISVLSI.2019.00039
Pankaj Bhowmik, Md Jubaer Hossain Pantho, S. Saha, C. Bobda
This paper presents a hardware architecture to extract features from an image using the concepts of bio-inspired computing and a method of converting sequential image processing to parallel computational processing units that can execute on the sensor. These computational units are oriented on vertically integrated hierarchical planes and enabled with a region based Attention Module which separates the Regions of Interest (ROIs) from the image. In each layer, the computational units work in parallel and introduce massive parallelism at the pixel level. At the same time, the design saves dynamic power by dynamically enabling and disabling the computational units which ensure high-performance and high-throughput. Moreover, the units are made reconfigurable to support a wide range of machine vision applications by forming a basic structure that is common to all operations and reconfigurable parts for a specific application. Our simulation result shows the design achieves 4.852X power savings on ROIs while processing at 465 Kfps with 800 MHz clock frequency.
{"title":"A Reconfigurable Layered-Based Bio-Inspired Smart Image Sensor","authors":"Pankaj Bhowmik, Md Jubaer Hossain Pantho, S. Saha, C. Bobda","doi":"10.1109/ISVLSI.2019.00039","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00039","url":null,"abstract":"This paper presents a hardware architecture to extract features from an image using the concepts of bio-inspired computing and a method of converting sequential image processing to parallel computational processing units that can execute on the sensor. These computational units are oriented on vertically integrated hierarchical planes and enabled with a region based Attention Module which separates the Regions of Interest (ROIs) from the image. In each layer, the computational units work in parallel and introduce massive parallelism at the pixel level. At the same time, the design saves dynamic power by dynamically enabling and disabling the computational units which ensure high-performance and high-throughput. Moreover, the units are made reconfigurable to support a wide range of machine vision applications by forming a basic structure that is common to all operations and reconfigurable parts for a specific application. Our simulation result shows the design achieves 4.852X power savings on ROIs while processing at 465 Kfps with 800 MHz clock frequency.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"2014 1","pages":"169-174"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86646886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/ISVLSI.2019.00059
Henrique Placido, R. Reis
The Lagrangian relaxation (LR) based gate sizer proposed in [1] has the best leakage power results published so far for the ISPD 2012 Gate Sizing Contest benchmarks. However, it requires many LR iterations and does not rely on any technique to perform cell option candidate filtering in the LR subproblem solver. Therefore, this paper presents some extensions to address these drawbacks. In order to reduce the number of LR iterations, we propose some enhancements to the original LR multiplier formula. We also use a scaling factor to properly scale timing cost and leakage power in the LR local cost. Moreover, we apply a cell option candidate filtering strategy to reduce the runtime of each LR iteration. Finally, we improve the post-processing timing recovery and power recovery. Our work achieved leakage power results very close to the original algorithm, taking 4.28x fewer LR iterations, on average, and 9.11x fewer cell swaps during LR, on average.
{"title":"Tackling the Drawbacks of a Lagrangian Relaxation Based Discrete Gate Sizing Algorithm","authors":"Henrique Placido, R. Reis","doi":"10.1109/ISVLSI.2019.00059","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00059","url":null,"abstract":"The Lagrangian relaxation (LR) based gate sizer proposed in [1] has the best leakage power results published so far for the ISPD 2012 Gate Sizing Contest benchmarks. However, it requires many LR iterations and does not rely on any technique to perform cell option candidate filtering in the LR subproblem solver. Therefore, this paper presents some extensions to address these drawbacks. In order to reduce the number of LR iterations, we propose some enhancements to the original LR multiplier formula. We also use a scaling factor to properly scale timing cost and leakage power in the LR local cost. Moreover, we apply a cell option candidate filtering strategy to reduce the runtime of each LR iteration. Finally, we improve the post-processing timing recovery and power recovery. Our work achieved leakage power results very close to the original algorithm, taking 4.28x fewer LR iterations, on average, and 9.11x fewer cell swaps during LR, on average.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"97 1","pages":"284-289"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83739845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/ISVLSI.2019.00122
Yuntao Liu, D. Dachman-Soled, Ankur Srivastava
With the structure of deep neural networks (DNN) being of increasing commercial value, DNN reverse engineering attacks have become a great security concern. It has been shown that the memory access pattern of a processor running DNNs can be exploited to decipher their detailed structure. In this work, we propose a defensive memory access mechanism which utilizes oblivious shuffle, address space layout randomization, and dummy memory accesses to counter such attacks. Experiments show that our defense exponentially increases the attack complexity with asymptotically lower memory access overhead compared to generic memory obfuscation techniques such as ORAM and is scalable to larger DNNs.
{"title":"Mitigating Reverse Engineering Attacks on Deep Neural Networks","authors":"Yuntao Liu, D. Dachman-Soled, Ankur Srivastava","doi":"10.1109/ISVLSI.2019.00122","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00122","url":null,"abstract":"With the structure of deep neural networks (DNN) being of increasing commercial value, DNN reverse engineering attacks have become a great security concern. It has been shown that the memory access pattern of a processor running DNNs can be exploited to decipher their detailed structure. In this work, we propose a defensive memory access mechanism which utilizes oblivious shuffle, address space layout randomization, and dummy memory accesses to counter such attacks. Experiments show that our defense exponentially increases the attack complexity with asymptotically lower memory access overhead compared to generic memory obfuscation techniques such as ORAM and is scalable to larger DNNs.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"10 1","pages":"657-662"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81995413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/ISVLSI.2019.00090
A. Fayyazi, Souvik Kundu, Shahin Nazarian, P. Beerel, Massoud Pedram
Artificial Neural Networks (ANNs) play a key role in many machine learning (ML) applications but poses arduous challenges in terms of storage and computation of network parameters. Memristive crossbar arrays (MCAs) are capable of both computation and storage, making them promising for in-memory computing enabled neural network accelerators. At the same time, the presence of a significant amount of zero weights in ANNs has motivated research in a variety of parameter reduction techniques. However, for crossbar based architectures, the study of efficient methods to take advantage of network sparsity is still in the early stage. This paper presents CSrram, an efficient ex-situ training framework for hybrid CMOS-memristive neuromorphic circuits. CSrram includes a pre-defined block diagonal clustered (BDC) sparsity algorithm to significantly reduce area and power consumption. The proposed framework is verified on a wide range of datasets including MNIST handwritten recognition, fashion MNIST, breast cancer prediction (BCW), IRIS, and mobile health monitoring. Compared to state of the art fully connected memristive neuromorphic circuits, our CSrram with only 25% density of weights in the first junction, provides a power and area efficiency of 1.5x and 2.6x (averaged over five datasets), respectively, without any significant test accuracy loss.
{"title":"CSrram: Area-Efficient Low-Power Ex-Situ Training Framework for Memristive Neuromorphic Circuits Based on Clustered Sparsity","authors":"A. Fayyazi, Souvik Kundu, Shahin Nazarian, P. Beerel, Massoud Pedram","doi":"10.1109/ISVLSI.2019.00090","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00090","url":null,"abstract":"Artificial Neural Networks (ANNs) play a key role in many machine learning (ML) applications but poses arduous challenges in terms of storage and computation of network parameters. Memristive crossbar arrays (MCAs) are capable of both computation and storage, making them promising for in-memory computing enabled neural network accelerators. At the same time, the presence of a significant amount of zero weights in ANNs has motivated research in a variety of parameter reduction techniques. However, for crossbar based architectures, the study of efficient methods to take advantage of network sparsity is still in the early stage. This paper presents CSrram, an efficient ex-situ training framework for hybrid CMOS-memristive neuromorphic circuits. CSrram includes a pre-defined block diagonal clustered (BDC) sparsity algorithm to significantly reduce area and power consumption. The proposed framework is verified on a wide range of datasets including MNIST handwritten recognition, fashion MNIST, breast cancer prediction (BCW), IRIS, and mobile health monitoring. Compared to state of the art fully connected memristive neuromorphic circuits, our CSrram with only 25% density of weights in the first junction, provides a power and area efficiency of 1.5x and 2.6x (averaged over five datasets), respectively, without any significant test accuracy loss.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"250 1","pages":"465-470"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75758584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/ISVLSI.2019.00024
Nooshin Nosrati, Katayoon Basharkhah, Rezgar Sadeghi, Z. Navabi
This paper focuses on an ESL integrated environment for modeling communication channels at an abstract level and providing a mechanism for insertion of interconnect electrical faults into the channels for coverage analysis. The channels are designed for abstract initiator-target communications and have a general format that contains properties found in SystemC, TLM-1 and TLM-2.0 channels. This paper presents a relatively complex SystemC channel and shows how our suggested mechanism for crosstalk fault modeling can be inserted into the communication lines of the channel. Crosstalk models examined here are 1) at an abstract aggressor-victim level described by programming a SystemC channel, and 2) at the electrical level using SystemC-AMS. Results show correspondence of crosstalk faults at the two levels, and at the same time much faster simulations for the former.
{"title":"An ESL Environment for Modeling Electrical Interconnect Faults","authors":"Nooshin Nosrati, Katayoon Basharkhah, Rezgar Sadeghi, Z. Navabi","doi":"10.1109/ISVLSI.2019.00024","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00024","url":null,"abstract":"This paper focuses on an ESL integrated environment for modeling communication channels at an abstract level and providing a mechanism for insertion of interconnect electrical faults into the channels for coverage analysis. The channels are designed for abstract initiator-target communications and have a general format that contains properties found in SystemC, TLM-1 and TLM-2.0 channels. This paper presents a relatively complex SystemC channel and shows how our suggested mechanism for crosstalk fault modeling can be inserted into the communication lines of the channel. Crosstalk models examined here are 1) at an abstract aggressor-victim level described by programming a SystemC channel, and 2) at the electrical level using SystemC-AMS. Results show correspondence of crosstalk faults at the two levels, and at the same time much faster simulations for the former.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"20 1","pages":"88-93"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75788048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1109/ISVLSI.2019.00033
M. Alawad, G. Tourassi
Deep learning (DL) has been used for many natural language processing (NLP) tasks due to its superior performance as compared to traditional machine learning approaches. In DL models for NLP, words are represented using word embeddings, which capture both semantic and syntactic information in text. However, 90-95% of the DL trainable parameters are associated with the word embeddings, resulting in a large storage or memory footprint. Therefore, reducing the number of word embedding parameters is critical, especially with the increase of vocabulary size. In this work, we propose a novel approximate word embeddings approach for convolutional neural networks (CNNs) used for text classification tasks. The proposed approach significantly reduces the number of model trainable parameters without noticeably sacrificing in computing performance accuracy. Compared to other techniques, our proposed word embeddings technique does not require modifications to the DL model architecture. We evaluate the performance of the the proposed word embeddings on three classification tasks using two datasets, composed of Yelp and Amazon reviews. The results show that the proposed method can reduce the number of word embeddings parameters by 98% and 99% for the Yelp and Amazon datasets respectively, with no drop in computing accuracy.
{"title":"Computationally Efficient Learning of Quality Controlled Word Embeddings for Natural Language Processing","authors":"M. Alawad, G. Tourassi","doi":"10.1109/ISVLSI.2019.00033","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00033","url":null,"abstract":"Deep learning (DL) has been used for many natural language processing (NLP) tasks due to its superior performance as compared to traditional machine learning approaches. In DL models for NLP, words are represented using word embeddings, which capture both semantic and syntactic information in text. However, 90-95% of the DL trainable parameters are associated with the word embeddings, resulting in a large storage or memory footprint. Therefore, reducing the number of word embedding parameters is critical, especially with the increase of vocabulary size. In this work, we propose a novel approximate word embeddings approach for convolutional neural networks (CNNs) used for text classification tasks. The proposed approach significantly reduces the number of model trainable parameters without noticeably sacrificing in computing performance accuracy. Compared to other techniques, our proposed word embeddings technique does not require modifications to the DL model architecture. We evaluate the performance of the the proposed word embeddings on three classification tasks using two datasets, composed of Yelp and Amazon reviews. The results show that the proposed method can reduce the number of word embeddings parameters by 98% and 99% for the Yelp and Amazon datasets respectively, with no drop in computing accuracy.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"9 1","pages":"134-139"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72712078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}