Pub Date : 2023-10-31DOI: 10.1109/JETCAS.2023.3328857
Yu-Chuan Chuang;Cheng-Yang Chang;An-Yeu Wu
Brain-inspired hyperdimensional computing (HDC) has attracted attention due to its energy efficiency and noise resilience in various IoT applications. However, striking the right balance between accuracy and efficiency in HDC remains a challenge. Specifically, HDC represents data as high-dimensional vectors known as hypervectors (HVs), where each component of HVs can be a high-precision integer or a low-cost bipolar number (+1/−1). However, this choice presents HDC with a significant trade-off between accuracy and efficiency. To address this challenge, we propose a two-stage dynamic inference framework called Dynamic-HDC that offers IoT applications a more flexible solution rather than limiting them to choose between the two extreme options. Dynamic-HDC leverages the strategies of early exit and model parameter adaptation. Unlike prior works that use a single HDC model to classify all data, Dynamic-HDC employs a cascade of models for two-stage inference. The first stage involves a low-cost, low-precision bipolar model, while the second stage utilizes a high-cost, high-precision integer model. By doing so, Dynamic-HDC can save computational resources for easy samples by performing an early exit when the low-cost bipolar model exhibits high confidence in its classification. For difficult samples, the high-precision integer model is conditionally activated to achieve more accurate predictions. To further enhance the efficiency of Dynamic-HDC, we introduce dynamic dimension selection (DDS) and dynamic class selection (DCS). These techniques enable the framework to dynamically adapt the dimensions and the number of classes in the HDC model, further optimizing performance. We evaluate the effectiveness of Dynamic-HDC on three commonly used benchmarks in HDC research, namely MNIST, ISOLET, and UCIHAR. Our simulation results demonstrate that Dynamic-HDC with different configurations can reduce energy consumption by 19.8-51.1% and execution time by 22.5-49.9% with negligible 0.02-0.36 % accuracy degradation compared to a single integer model. Compared to a single bipolar model, Dynamic-HDC improves 3.1% accuracy with a slight 10% energy and 14% execution time overhead.
{"title":"Dynamic-HDC: A Two-Stage Dynamic Inference Framework for Brain-Inspired Hyperdimensional Computing","authors":"Yu-Chuan Chuang;Cheng-Yang Chang;An-Yeu Wu","doi":"10.1109/JETCAS.2023.3328857","DOIUrl":"10.1109/JETCAS.2023.3328857","url":null,"abstract":"Brain-inspired hyperdimensional computing (HDC) has attracted attention due to its energy efficiency and noise resilience in various IoT applications. However, striking the right balance between accuracy and efficiency in HDC remains a challenge. Specifically, HDC represents data as high-dimensional vectors known as hypervectors (HVs), where each component of HVs can be a high-precision integer or a low-cost bipolar number (+1/−1). However, this choice presents HDC with a significant trade-off between accuracy and efficiency. To address this challenge, we propose a two-stage dynamic inference framework called Dynamic-HDC that offers IoT applications a more flexible solution rather than limiting them to choose between the two extreme options. Dynamic-HDC leverages the strategies of early exit and model parameter adaptation. Unlike prior works that use a single HDC model to classify all data, Dynamic-HDC employs a cascade of models for two-stage inference. The first stage involves a low-cost, low-precision bipolar model, while the second stage utilizes a high-cost, high-precision integer model. By doing so, Dynamic-HDC can save computational resources for easy samples by performing an early exit when the low-cost bipolar model exhibits high confidence in its classification. For difficult samples, the high-precision integer model is conditionally activated to achieve more accurate predictions. To further enhance the efficiency of Dynamic-HDC, we introduce dynamic dimension selection (DDS) and dynamic class selection (DCS). These techniques enable the framework to dynamically adapt the dimensions and the number of classes in the HDC model, further optimizing performance. We evaluate the effectiveness of Dynamic-HDC on three commonly used benchmarks in HDC research, namely MNIST, ISOLET, and UCIHAR. Our simulation results demonstrate that Dynamic-HDC with different configurations can reduce energy consumption by 19.8-51.1% and execution time by 22.5-49.9% with negligible 0.02-0.36 % accuracy degradation compared to a single integer model. Compared to a single bipolar model, Dynamic-HDC improves 3.1% accuracy with a slight 10% energy and 14% execution time overhead.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135263033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-31DOI: 10.1109/JETCAS.2023.3328890
Changmin Ye;Doo Seok Jeong
Computing-in-memory (CIM) macros aiming at accelerating deep learning operations at low power need activation function (AF) units on the same die to reduce their host-dependency. Versatile CIM macros need to include reconfigurable AF units at high precision and high efficiency in hardware usage. To this end, we propose the gradient-error-based approximation (GEBA) of AFs, which approximates various types of AFs in discrete input domains at high precision. GEBA reduces the approximation error by ca. 49.7%, 67.3%, 81.4%, 60.1% (for sigmoid, tanh, GELU, swish in FP32), compared with the uniform input-based approximation using the same memory as GEBA.
旨在以低功耗加速深度学习操作的内存计算(CIM)宏需要同一芯片上的激活函数(AF)单元,以减少对主机的依赖。多功能 CIM 宏需要包含可重新配置的高精度 AF 单元,并提高硬件使用效率。为此,我们提出了基于梯度误差的 AF 近似 (GEBA),可以高精度逼近离散输入域中的各类 AF。与使用与 GEBA 相同内存的基于均匀输入的近似方法相比,GEBA 将近似误差分别降低了约 49.7%、67.3%、81.4% 和 60.1%(对于 FP32 中的 sigmoid、tanh、GELU 和 swish)。
{"title":"GEBA: Gradient-Error-Based Approximation of Activation Functions","authors":"Changmin Ye;Doo Seok Jeong","doi":"10.1109/JETCAS.2023.3328890","DOIUrl":"10.1109/JETCAS.2023.3328890","url":null,"abstract":"Computing-in-memory (CIM) macros aiming at accelerating deep learning operations at low power need activation function (AF) units on the same die to reduce their host-dependency. Versatile CIM macros need to include reconfigurable AF units at high precision and high efficiency in hardware usage. To this end, we propose the gradient-error-based approximation (GEBA) of AFs, which approximates various types of AFs in discrete input domains at high precision. GEBA reduces the approximation error by ca. 49.7%, 67.3%, 81.4%, 60.1% (for sigmoid, tanh, GELU, swish in FP32), compared with the uniform input-based approximation using the same memory as GEBA.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135263024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-31DOI: 10.1109/JETCAS.2023.3328887
Maria J. Avedillo;Manuel Jiménez Través;Corentin Delacour;Aida Todri-Sanial;Bernabé Linares-Barranco;Juan Núñez
Coupled nano-oscillators are attracting increasing interest because of their potential to perform computation efficiently, enabling new applications in computing and information processing. The potential of phase transition devices for such dynamical systems has recently been recognized. This paper investigates the implementation of coupled VO2-based oscillator networks to solve combinatorial optimization problems. The target problem is mapped to an Ising model, which is solved by the synchronization dynamics of the system. Different factors that impact the probability of the system reaching the ground state of the Ising Hamiltonian and, therefore, the optimum solution to the corresponding optimization problem, are analyzed. The simulation-based analysis has led to the proposal of a novel Second-Harmonic Injection Locking (SHIL) schedule. Its main feature is that SHIL signal amplitude is repeatedly smoothly increased and decreased. Reducing SHIL strength is the mechanism that enables escaping from local minimum energy states. Our experiments show better results in terms of success probability than previously reported approaches. An experimental Oscillatory Ising Machine (OIM) has been built to validate our proposal.
{"title":"Operating Coupled VO₂-Based Oscillators for Solving Ising Models","authors":"Maria J. Avedillo;Manuel Jiménez Través;Corentin Delacour;Aida Todri-Sanial;Bernabé Linares-Barranco;Juan Núñez","doi":"10.1109/JETCAS.2023.3328887","DOIUrl":"10.1109/JETCAS.2023.3328887","url":null,"abstract":"Coupled nano-oscillators are attracting increasing interest because of their potential to perform computation efficiently, enabling new applications in computing and information processing. The potential of phase transition devices for such dynamical systems has recently been recognized. This paper investigates the implementation of coupled VO2-based oscillator networks to solve combinatorial optimization problems. The target problem is mapped to an Ising model, which is solved by the synchronization dynamics of the system. Different factors that impact the probability of the system reaching the ground state of the Ising Hamiltonian and, therefore, the optimum solution to the corresponding optimization problem, are analyzed. The simulation-based analysis has led to the proposal of a novel Second-Harmonic Injection Locking (SHIL) schedule. Its main feature is that SHIL signal amplitude is repeatedly smoothly increased and decreased. Reducing SHIL strength is the mechanism that enables escaping from local minimum energy states. Our experiments show better results in terms of success probability than previously reported approaches. An experimental Oscillatory Ising Machine (OIM) has been built to validate our proposal.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135263563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-31DOI: 10.1109/JETCAS.2023.3328875
Vineeta V. Nair;Chithra Reghuvaran;Deepu John;Bhaskar Choubey;Alex James
Synaptic stochasticity is an important feature of biological neural networks that is not widely explored in analog memristor networks. Synaptic Sampling Machine (SSM) is one of the recent models of the neural network that explores the importance of the synaptic stochasticity. In this paper, we present a memristive Echo State Network (ESN) with Extended-SSM (ESSM). The circuit-level design of the single synaptic sampling cell that can introduce stochasticity to the neural network is presented. The architecture of synaptic sampling cells is proposed that have the ability to adaptively reprogram the arrays and respond to stimuli of various strengths. The effect of stochasticity is achieved by randomly blocking the input with the probability that follows Bernoulli distribution, and can lead to the reduction of the memory capacity requirements. The blocking signals are randomly generated using Circular Shift Registers (CSRs). The network processing is handled in analog domain and the training is performed offline. The performance of the neural network is analyzed with a view to benchmark for hardware performance without compromising the system performance. The neural system was tested on ECG, MNIST, Fashion MNIST and CIFAR10 dataset for classification problem. The advantage of memristive CSR in comparison with conventional CMOS based CSR is presented. The ESSM-ESN performance is evaluated with the effect of device variations like resistance variations, noise and quantization. The advantage of ESSM-ESN is demonstrated in terms of performance and power requirements in comparison with other neural architectures.
{"title":"ESSM: Extended Synaptic Sampling Machine With Stochastic Echo State Neuro-Memristive Circuits","authors":"Vineeta V. Nair;Chithra Reghuvaran;Deepu John;Bhaskar Choubey;Alex James","doi":"10.1109/JETCAS.2023.3328875","DOIUrl":"10.1109/JETCAS.2023.3328875","url":null,"abstract":"Synaptic stochasticity is an important feature of biological neural networks that is not widely explored in analog memristor networks. Synaptic Sampling Machine (SSM) is one of the recent models of the neural network that explores the importance of the synaptic stochasticity. In this paper, we present a memristive Echo State Network (ESN) with Extended-SSM (ESSM). The circuit-level design of the single synaptic sampling cell that can introduce stochasticity to the neural network is presented. The architecture of synaptic sampling cells is proposed that have the ability to adaptively reprogram the arrays and respond to stimuli of various strengths. The effect of stochasticity is achieved by randomly blocking the input with the probability that follows Bernoulli distribution, and can lead to the reduction of the memory capacity requirements. The blocking signals are randomly generated using Circular Shift Registers (CSRs). The network processing is handled in analog domain and the training is performed offline. The performance of the neural network is analyzed with a view to benchmark for hardware performance without compromising the system performance. The neural system was tested on ECG, MNIST, Fashion MNIST and CIFAR10 dataset for classification problem. The advantage of memristive CSR in comparison with conventional CMOS based CSR is presented. The ESSM-ESN performance is evaluated with the effect of device variations like resistance variations, noise and quantization. The advantage of ESSM-ESN is demonstrated in terms of performance and power requirements in comparison with other neural architectures.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10302278","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135263567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spiking neural networks (SNNs) are well-suited for neuromorphic hardware due to their biological plausibility and energy efficiency. These networks utilize sparse, asynchronous spikes for communication and can be binarized. However, the training of such networks presents several challenges due to their non-differentiable activation function and binarized inter-layer data movement. The well-established backpropagation through time (BPTT) algorithm used to train SNNs encounters notable difficulties because of its substantial memory consumption and extensive computational demands. These limitations restrict its practical utility in real-world scenarios. Therefore, effective techniques are required to train such networks efficiently while preserving accuracy. In this paper, we propose Binarized Spike Timing Dependent Gradient (BSTDG), a novel method that utilizes presynaptic and postsynaptic timings to bypass the non-differentiable gradient and the need of BPTT. Additionally, we employ binarized weights with a threshold training strategy to enhance energy savings and performance. Moreover, we exploit latency/temporal-based coding and the Integrate-and-Fire (IF) model to achieve significant computational advantages. We evaluate the proposed method on Caltech101 Face/Motorcycle, MNIST, Fashion-MNIST, and Spiking Heidelberg Digits. The results demonstrate that the accuracy attained surpasses that of existing BSNNs and single-spike networks under the same structure. Furthermore, the proposed model achieves up to 30 $times times times $