Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129283
N. Huang, Min-Syue Yang, Ya-Chu Chang, Kai-Chiang Wu
As the demand for data analysis increases rapidly, artificial intelligence (AI) models have been developed for various applications. Many deep neural networks are presented with millions or billions of parameters and operations for AI computation. Therefore, many AI accelerators apply pipelined architectures with simple but dense computational elements for numerous operations. However, manufacturing-induced faults cause a challenge to computational robustness or yield degradation on those AI accelerators. In this paper, we propose a fault mitigation methodology based on decomposable systolic arrays. By leveraging the inherent error resilience of AI applications, our data arrangement can reduce the difference between accurate results and faulty results. Additionally, utilizing both our proposed data arrangement and sign compensation can further mitigate the influence of faults in AI accelerators. In the experiments, our proposed fault mitigation methodology can maintain the application accuracy at a certain level, which outperforms state-of-the-art methods. When 0.1% of multiplier-accumulators are faulty in a systolic array, the array with our proposed fault mitigation methodology can have less than 0.5% accuracy loss while executing ResNet-18 for ImageNet classification.
{"title":"Decomposable Architecture and Fault Mitigation Methodology for Deep Learning Accelerators","authors":"N. Huang, Min-Syue Yang, Ya-Chu Chang, Kai-Chiang Wu","doi":"10.1109/ISQED57927.2023.10129283","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129283","url":null,"abstract":"As the demand for data analysis increases rapidly, artificial intelligence (AI) models have been developed for various applications. Many deep neural networks are presented with millions or billions of parameters and operations for AI computation. Therefore, many AI accelerators apply pipelined architectures with simple but dense computational elements for numerous operations. However, manufacturing-induced faults cause a challenge to computational robustness or yield degradation on those AI accelerators. In this paper, we propose a fault mitigation methodology based on decomposable systolic arrays. By leveraging the inherent error resilience of AI applications, our data arrangement can reduce the difference between accurate results and faulty results. Additionally, utilizing both our proposed data arrangement and sign compensation can further mitigate the influence of faults in AI accelerators. In the experiments, our proposed fault mitigation methodology can maintain the application accuracy at a certain level, which outperforms state-of-the-art methods. When 0.1% of multiplier-accumulators are faulty in a systolic array, the array with our proposed fault mitigation methodology can have less than 0.5% accuracy loss while executing ResNet-18 for ImageNet classification.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130588020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129335
Shailesh Rajput, Jaya Dofe, Wafi Danesh
Field programmable gate arrays (FPGAs) are widely used in critical applications such as industrial, medical, automotive, and military systems due to their ability to be dynamically reconfigured at runtime. However, this reconfigurability also presents security concerns, as FPGA designs are encoded in a bitstream that adversaries can target for design cloning, IP theft, or hardware Trojan insertion. This work presents a proof-of-concept for detecting hardware Trojans (HT) in FPGA using an unsupervised machine-learning method that eliminates the need for reference models of HT. The proposed method is based on transforming the configuration bitstream into an encoded vector, bypassing the need for netlist reconstruction and allowing for HT detection based solely on the extracted FPGA layout information. Our method was evaluated against various HT attack scenarios and accurately detected all infected bitstreams.
{"title":"Automating Hardware Trojan Detection Using Unsupervised Learning: A Case Study of FPGA","authors":"Shailesh Rajput, Jaya Dofe, Wafi Danesh","doi":"10.1109/ISQED57927.2023.10129335","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129335","url":null,"abstract":"Field programmable gate arrays (FPGAs) are widely used in critical applications such as industrial, medical, automotive, and military systems due to their ability to be dynamically reconfigured at runtime. However, this reconfigurability also presents security concerns, as FPGA designs are encoded in a bitstream that adversaries can target for design cloning, IP theft, or hardware Trojan insertion. This work presents a proof-of-concept for detecting hardware Trojans (HT) in FPGA using an unsupervised machine-learning method that eliminates the need for reference models of HT. The proposed method is based on transforming the configuration bitstream into an encoded vector, bypassing the need for netlist reconstruction and allowing for HT detection based solely on the extracted FPGA layout information. Our method was evaluated against various HT attack scenarios and accurately detected all infected bitstreams.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130481541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129325
Xinyuan Qiao, Suwen Song, Jing Tian, Zhongfeng Wang
As one of the candidates evaluated in the process of the National Institute of Standards and Technology (NIST) post-quantum cryptography standardization, the Classic McEliece, is being widely studied for its strong security. In existing decryption architectures, the Goppa decoder is logic resource intensive, and the fast Fourier transform (FFT) unit limits its achievable frequency. In this paper, a novel folded Goppa decoder based on enhanced parallel inversionless Berlekamp-Massey (ePiBM) algorithm is proposed for complexity reduction, and a two-dimensional optimization is adopted to eliminate the frequency bottleneck caused by the FFT unit. In addition, for the finite field inversion, which is a commonly used operation in decryption, an even power-based computation scheme is presented to reduce the cost of logic resources. Based on these optimizations, a complete decryption architecture is finally developed and implemented on the Altera Stratix V FPGA. Experimental results show that the proposed decryption processor can reduce up to 37.6% of logic resources and save the Time×Logic by up to 56.9% over the prior art.
作为美国国家标准与技术研究院(NIST)后量子加密标准化过程中评估的候选方案之一,经典McEliece因其强大的安全性而受到广泛研究。在现有的解密体系结构中,Goppa解码器是逻辑资源密集型的,并且快速傅里叶变换(FFT)单元限制了其可实现的频率。本文提出了一种基于增强并行无反转Berlekamp-Massey (ePiBM)算法的新型折叠Goppa解码器来降低复杂度,并采用二维优化来消除FFT单元带来的频率瓶颈。此外,针对解密中常用的有限域反演操作,提出了一种基于偶幂的计算方案,以减少逻辑资源的消耗。基于这些优化,最后开发了一个完整的解密架构,并在Altera Stratix V FPGA上实现。实验结果表明,与现有技术相比,所提出的解密处理器可减少高达37.6%的逻辑资源,节省高达56.9%的Time×Logic。
{"title":"Efficient Decryption Architecture for Classic McEliece","authors":"Xinyuan Qiao, Suwen Song, Jing Tian, Zhongfeng Wang","doi":"10.1109/ISQED57927.2023.10129325","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129325","url":null,"abstract":"As one of the candidates evaluated in the process of the National Institute of Standards and Technology (NIST) post-quantum cryptography standardization, the Classic McEliece, is being widely studied for its strong security. In existing decryption architectures, the Goppa decoder is logic resource intensive, and the fast Fourier transform (FFT) unit limits its achievable frequency. In this paper, a novel folded Goppa decoder based on enhanced parallel inversionless Berlekamp-Massey (ePiBM) algorithm is proposed for complexity reduction, and a two-dimensional optimization is adopted to eliminate the frequency bottleneck caused by the FFT unit. In addition, for the finite field inversion, which is a commonly used operation in decryption, an even power-based computation scheme is presented to reduce the cost of logic resources. Based on these optimizations, a complete decryption architecture is finally developed and implemented on the Altera Stratix V FPGA. Experimental results show that the proposed decryption processor can reduce up to 37.6% of logic resources and save the Time×Logic by up to 56.9% over the prior art.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"194 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132538577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129290
Gabriel Barajas, J. Greene, Fei Li, James Tandon
Accurate delay estimates for a user application implemented in a Field-Programmable Gate Array (FPGA) are essential for a quality FPGA timing flow and to avoid leaving performance on the table. FPGA inter-cluster routing consists of wire segments of a limited number of types which repeat in a somewhat regular pattern, interconnected by configurable muxes. The delay at any fanout of a segment can be significantly impacted by configuration-dependent capacitive loading related to other fanouts. Also, the insertion of RAM and math blocks into the FPGA floorplan introduces irregular stretching of the wire segments, altering their delays. We explain why and how commercial FPGA software typically employs a parameterized model for the delay at each fanout of a segment, based on the configuration and the irregularities present, with the parameters determined by fitting SPICE simulation data for a representative sample of cases. We propose incorporating readily-computed common path resistance values into the model. This enables high accuracy with fewer parameters and without the large amounts of SPICE data that would otherwise be required to explore interactions between floorplan irregularities and the set of active fanouts. In combination with other features of our models, errors in segment delay are reduced by almost half.
{"title":"Accounting for Floorplan Irregularity and Configuration Dependence in FPGA Routing Delay Models","authors":"Gabriel Barajas, J. Greene, Fei Li, James Tandon","doi":"10.1109/ISQED57927.2023.10129290","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129290","url":null,"abstract":"Accurate delay estimates for a user application implemented in a Field-Programmable Gate Array (FPGA) are essential for a quality FPGA timing flow and to avoid leaving performance on the table. FPGA inter-cluster routing consists of wire segments of a limited number of types which repeat in a somewhat regular pattern, interconnected by configurable muxes. The delay at any fanout of a segment can be significantly impacted by configuration-dependent capacitive loading related to other fanouts. Also, the insertion of RAM and math blocks into the FPGA floorplan introduces irregular stretching of the wire segments, altering their delays. We explain why and how commercial FPGA software typically employs a parameterized model for the delay at each fanout of a segment, based on the configuration and the irregularities present, with the parameters determined by fitting SPICE simulation data for a representative sample of cases. We propose incorporating readily-computed common path resistance values into the model. This enables high accuracy with fewer parameters and without the large amounts of SPICE data that would otherwise be required to explore interactions between floorplan irregularities and the set of active fanouts. In combination with other features of our models, errors in segment delay are reduced by almost half.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114878584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129355
Rahul Vishwakarma, Ravi Monani, Amin Rezaei, H. Sayadi, Mehrdad Aliasgari, A. Hedayatipour
The Global Wearable market is anticipated to rise at a considerable rate in the next coming years and communication is a fundamental block in any wearable device. In communication, encryption methods are being used with the aid of microcontrollers or software implementations, which are power-consuming and incorporate complex hardware implementation. Internet of Things (IoT) devices are considered as resource-constrained devices that are expected to operate with low computational power and resource utilization criteria. At the same time, recent research has shown that IoT devices are highly vulnerable to emerging security threats, which elevates the need for low-power and small-size hardware-based security countermeasures. Chaotic encryption is a method of data encryption that utilizes chaotic systems and non-linear dynamics to generate secure encryption keys. It aims to provide high-level security by creating encryption keys that are sensitive to initial conditions and difficult to predict, making it challenging for unauthorized parties to intercept and decode encrypted data. Since the discovery of chaotic equations, there have been various encryption applications associated with them. In this paper, we comprehensively analyze the physical and encryption attacks on continuous chaotic systems in resource-constrained devices and their potential remedies. To this aim, we introduce different categories of attacks of chaotic encryption. Our experiments focus on chaotic equations implemented using Chua’s equation and leverages circuit architectures and provide simulations proof of remedies for different attacks. These remedies are provided to block the attackers from stealing users’ information (e.g., a pulse message) with negligible cost to the power and area of the design.
{"title":"Attacks on Continuous Chaos Communication and Remedies for Resource Limited Devices","authors":"Rahul Vishwakarma, Ravi Monani, Amin Rezaei, H. Sayadi, Mehrdad Aliasgari, A. Hedayatipour","doi":"10.1109/ISQED57927.2023.10129355","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129355","url":null,"abstract":"The Global Wearable market is anticipated to rise at a considerable rate in the next coming years and communication is a fundamental block in any wearable device. In communication, encryption methods are being used with the aid of microcontrollers or software implementations, which are power-consuming and incorporate complex hardware implementation. Internet of Things (IoT) devices are considered as resource-constrained devices that are expected to operate with low computational power and resource utilization criteria. At the same time, recent research has shown that IoT devices are highly vulnerable to emerging security threats, which elevates the need for low-power and small-size hardware-based security countermeasures. Chaotic encryption is a method of data encryption that utilizes chaotic systems and non-linear dynamics to generate secure encryption keys. It aims to provide high-level security by creating encryption keys that are sensitive to initial conditions and difficult to predict, making it challenging for unauthorized parties to intercept and decode encrypted data. Since the discovery of chaotic equations, there have been various encryption applications associated with them. In this paper, we comprehensively analyze the physical and encryption attacks on continuous chaotic systems in resource-constrained devices and their potential remedies. To this aim, we introduce different categories of attacks of chaotic encryption. Our experiments focus on chaotic equations implemented using Chua’s equation and leverages circuit architectures and provide simulations proof of remedies for different attacks. These remedies are provided to block the attackers from stealing users’ information (e.g., a pulse message) with negligible cost to the power and area of the design.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116752279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129343
Robert C. Viramontes, A. Davoodi
The rising availability of heterogeneous networked devices highlights new opportunities for distributed artificial intelligence. This work proposes an Integer Linear Programming (ILP) optimization scheme to assign layers of a neural network in a distributed setting with heterogeneous devices representing edge, hub, and cloud in order to minimize the overall inference latency. The ILP formulation captures the tradeoff between avoiding communication cost when executing consecutive layers on the same device versus the latency benefit due to weight pre-loading when an idle device is waiting to receive the results of an earlier layer across the network. In our experiments we show the layer assignment and inference latency of a neural network can significantly vary depending on the types of devices in the network and their communications bandwidths.
{"title":"Neural Network Partitioning for Fast Distributed Inference","authors":"Robert C. Viramontes, A. Davoodi","doi":"10.1109/ISQED57927.2023.10129343","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129343","url":null,"abstract":"The rising availability of heterogeneous networked devices highlights new opportunities for distributed artificial intelligence. This work proposes an Integer Linear Programming (ILP) optimization scheme to assign layers of a neural network in a distributed setting with heterogeneous devices representing edge, hub, and cloud in order to minimize the overall inference latency. The ILP formulation captures the tradeoff between avoiding communication cost when executing consecutive layers on the same device versus the latency benefit due to weight pre-loading when an idle device is waiting to receive the results of an earlier layer across the network. In our experiments we show the layer assignment and inference latency of a neural network can significantly vary depending on the types of devices in the network and their communications bandwidths.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133872006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129351
Md. Mazharul Islam, M. S. Hossain, A. Aziz
A circuit system consisting of LC-oscillators is mathematically shown to emulate quantum circuits. Here we develop a SPICE-based framework for emulating universal quantum gates (the phase-shift, the Hadamard, and the CNOT gates). To reconstruct each gate behavior, the inductors and capacitors are chosen and tuned precisely. Each quantum state is perfectly described by the phase and amplitude of each oscillator. In principle, our framework can simulate a quantum system with any arbitrary number of qubits since each gate process is perfectly achieved. Finally, we have simulated a quantum circuit with 3 qubits as input that consists of all the universal quantum gates. Our simulation result shows that our framework can classically emulate the result of any quantum circuit with an arbitrary number of qubits and quantum logic components.
{"title":"A SPICE-based Framework to Emulate Quantum Circuits with classical LC Resonators","authors":"Md. Mazharul Islam, M. S. Hossain, A. Aziz","doi":"10.1109/ISQED57927.2023.10129351","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129351","url":null,"abstract":"A circuit system consisting of LC-oscillators is mathematically shown to emulate quantum circuits. Here we develop a SPICE-based framework for emulating universal quantum gates (the phase-shift, the Hadamard, and the CNOT gates). To reconstruct each gate behavior, the inductors and capacitors are chosen and tuned precisely. Each quantum state is perfectly described by the phase and amplitude of each oscillator. In principle, our framework can simulate a quantum system with any arbitrary number of qubits since each gate process is perfectly achieved. Finally, we have simulated a quantum circuit with 3 qubits as input that consists of all the universal quantum gates. Our simulation result shows that our framework can classically emulate the result of any quantum circuit with an arbitrary number of qubits and quantum logic components.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123780652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129346
Dake Chen, Xuan Zhou, Yinghua Hu, Yuke Zhang, Kaixin Yang, A. Rittenbach, P. Nuzzo, P. Beerel
Logic locking has become a promising approach to provide hardware security in the face of a possibly insecure fabrication supply chain. While many techniques have focused on locking combinational logic (CL), an alternative latch-locking approach in which the sequential elements are locked has also gained significant attention. Latch (LAT) locking duplicates a subset of the flip-flops (FF) of a design, retimes these FFs and replaces them with latches, and adds two types of decoy latches to obfuscate the netlist. It then adds control circuitry (CC) such that all latches must be correctly keyed for the circuit to function correctly. This paper presents a two-phase attack on latch-locked circuits that uses a novel combination of deep learning, Boolean analysis, and integer linear programming (ILP). The attack requires access to the reverse-engineered netlist but, unlike SAT attacks, is oracle-less, not needing access to the unlocked circuit or correct input/output pairs. We trained and evaluated the attack using the ISCAS’89 and ITC’99 benchmark circuits. The attack successfully identifies a key that is, on average, 96.9% accurate and fully discloses the correct functionality in 8 of the tested 19 circuits and leads to low function corruptibility (less than 4%) in 3 additional circuits. The attack run-times are manageable.
{"title":"Unraveling Latch Locking Using Machine Learning, Boolean Analysis, and ILP","authors":"Dake Chen, Xuan Zhou, Yinghua Hu, Yuke Zhang, Kaixin Yang, A. Rittenbach, P. Nuzzo, P. Beerel","doi":"10.1109/ISQED57927.2023.10129346","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129346","url":null,"abstract":"Logic locking has become a promising approach to provide hardware security in the face of a possibly insecure fabrication supply chain. While many techniques have focused on locking combinational logic (CL), an alternative latch-locking approach in which the sequential elements are locked has also gained significant attention. Latch (LAT) locking duplicates a subset of the flip-flops (FF) of a design, retimes these FFs and replaces them with latches, and adds two types of decoy latches to obfuscate the netlist. It then adds control circuitry (CC) such that all latches must be correctly keyed for the circuit to function correctly. This paper presents a two-phase attack on latch-locked circuits that uses a novel combination of deep learning, Boolean analysis, and integer linear programming (ILP). The attack requires access to the reverse-engineered netlist but, unlike SAT attacks, is oracle-less, not needing access to the unlocked circuit or correct input/output pairs. We trained and evaluated the attack using the ISCAS’89 and ITC’99 benchmark circuits. The attack successfully identifies a key that is, on average, 96.9% accurate and fully discloses the correct functionality in 8 of the tested 19 circuits and leads to low function corruptibility (less than 4%) in 3 additional circuits. The attack run-times are manageable.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123072786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129322
M. Morsali, Ranyang Zhou, Sepehr Tabrizchi, A. Roohi, Shaahin Angizi
In this work, we leverage the uni-polar switching behavior of Spin-Orbit Torque Magnetic Random Access Memory (SOT-MRAM) to develop an efficient digital Computing-in-Memory (CiM) platform named XOR-CiM. XOR-CiM converts typical MRAM sub-arrays to massively parallel computational cores with ultra-high bandwidth, greatly reducing energy consumption dealing with convolutional layers and accelerating X(N)OR-intensive Binary Neural Networks (BNNs) inference. With a similar inference accuracy to digital CiMs, XOR-CiM achieves ∼4.5× and 1.8× higher energy-efficiency and speed-up compared to the recent MRAM-based CiM platforms.
{"title":"XOR-CiM: An Efficient Computing-in-SOT-MRAM Design for Binary Neural Network Acceleration","authors":"M. Morsali, Ranyang Zhou, Sepehr Tabrizchi, A. Roohi, Shaahin Angizi","doi":"10.1109/ISQED57927.2023.10129322","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129322","url":null,"abstract":"In this work, we leverage the uni-polar switching behavior of Spin-Orbit Torque Magnetic Random Access Memory (SOT-MRAM) to develop an efficient digital Computing-in-Memory (CiM) platform named XOR-CiM. XOR-CiM converts typical MRAM sub-arrays to massively parallel computational cores with ultra-high bandwidth, greatly reducing energy consumption dealing with convolutional layers and accelerating X(N)OR-intensive Binary Neural Networks (BNNs) inference. With a similar inference accuracy to digital CiMs, XOR-CiM achieves ∼4.5× and 1.8× higher energy-efficiency and speed-up compared to the recent MRAM-based CiM platforms.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132021704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}