Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168605
Jingyu He, Ziyang Shen, Fengshi Tian, Jinbo Chen, Jie Yang, M. Sawan, Hsiang-Ting Chen, P. Bogdan, C. Tsui
We propose a systematic application-specific hardware design methodology for designing Spiking Neural Network (SNN), SNNOpt, which consists of three novel phases: 1) an Olliver-Ricci-Curvature (ORC)-based architecture-aware network partitioning, 2) a reinforcement learning mapping strategy, and 3) a Bayesian optimization algorithm for NoC design space exploration. Experimental results show that SNNOpt achieves a 47.45% less runtime and 58.64% energy savings over state-of-the-art approaches.
{"title":"SNNOpt: An Application-Specific Design Framework for Spiking Neural Networks","authors":"Jingyu He, Ziyang Shen, Fengshi Tian, Jinbo Chen, Jie Yang, M. Sawan, Hsiang-Ting Chen, P. Bogdan, C. Tsui","doi":"10.1109/AICAS57966.2023.10168605","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168605","url":null,"abstract":"We propose a systematic application-specific hardware design methodology for designing Spiking Neural Network (SNN), SNNOpt, which consists of three novel phases: 1) an Olliver-Ricci-Curvature (ORC)-based architecture-aware network partitioning, 2) a reinforcement learning mapping strategy, and 3) a Bayesian optimization algorithm for NoC design space exploration. Experimental results show that SNNOpt achieves a 47.45% less runtime and 58.64% energy savings over state-of-the-art approaches.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132385149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Event cameras with high temporal resolution and high dynamic range have great potential in computer vision (CV) tasks. To utilize the deep neural networks directly, an efficient reconstruction method converting event-based data to frame-based is necessary. In this work, the interpretable Event Represented Intensity (ERI) model that recovers the logarithm of the intensity sensed by a dynamic vision pixel is proposed for the first time. The amplitude-frequency characteristic of the recovered logarithm of the intensity is used to construct the frame-based image to complete CV tasks. Experiment results on the N-Caltech101 dataset show that the proposed ERI model achieves the classification accuracy of 79.20%, which balances the performance and computation cost better.
{"title":"An Interpretable Pixel Intensity Reconstruction Model for Asynchronous Event Camera","authors":"Hongwei Shan, Lichen Feng, Yueqi Zhang, Zhangming Zhu","doi":"10.1109/AICAS57966.2023.10168635","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168635","url":null,"abstract":"Event cameras with high temporal resolution and high dynamic range have great potential in computer vision (CV) tasks. To utilize the deep neural networks directly, an efficient reconstruction method converting event-based data to frame-based is necessary. In this work, the interpretable Event Represented Intensity (ERI) model that recovers the logarithm of the intensity sensed by a dynamic vision pixel is proposed for the first time. The amplitude-frequency characteristic of the recovered logarithm of the intensity is used to construct the frame-based image to complete CV tasks. Experiment results on the N-Caltech101 dataset show that the proposed ERI model achieves the classification accuracy of 79.20%, which balances the performance and computation cost better.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133481011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168610
Udari De Alwis, Zhongheng Xie, Massimo Alioto
Recognizing human actions in video sequences has become an essential task in video surveillance applications. In such applications, transformer models have rapidly gained wide interest thanks to their performance. However, their advantages come at the cost of a high computational and memory cost, especially when they need to be incorporated in edge devices. In this work, temporal similarity tunnel insertion is utilized to reduce the overall computation burden in video transformer networks in action recognition tasks. Furthermore, an edge-friendly video transformer model is proposed based on temporal similarity, which substantially reduces the computation cost. Its smaller variant EMViT achieves 38% computation reduction under the UCF101 dataset, while keeping the accuracy degradation insignificant (<0.02%). Also, the larger variant CMViT reduces computation by 14% (13%) with an accuracy degradation of 2% (3%) in scaled Kinetic400 and Jester datasets.
{"title":"Temporal Similarity-Based Computation Reduction for Video Transformers in Edge Camera Nodes","authors":"Udari De Alwis, Zhongheng Xie, Massimo Alioto","doi":"10.1109/AICAS57966.2023.10168610","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168610","url":null,"abstract":"Recognizing human actions in video sequences has become an essential task in video surveillance applications. In such applications, transformer models have rapidly gained wide interest thanks to their performance. However, their advantages come at the cost of a high computational and memory cost, especially when they need to be incorporated in edge devices. In this work, temporal similarity tunnel insertion is utilized to reduce the overall computation burden in video transformer networks in action recognition tasks. Furthermore, an edge-friendly video transformer model is proposed based on temporal similarity, which substantially reduces the computation cost. Its smaller variant EMViT achieves 38% computation reduction under the UCF101 dataset, while keeping the accuracy degradation insignificant (<0.02%). Also, the larger variant CMViT reduces computation by 14% (13%) with an accuracy degradation of 2% (3%) in scaled Kinetic400 and Jester datasets.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132076976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adventitious cardiopulmonary (lung and heart) sound detection and classification through a digital stethoscope plays a vital role in early diagnosis and telehealth services. However, automatically detecting the adventitious sounds is challenging since they are easily susceptible to each other’s influence and noises. In this paper, for the first time, we simultaneously classify adventitious lung and heart sounds using our proposed LungHeart-AtMe model based on a mixed dataset of the ICBHI 2017 lung sounds dataset and the PhysioNet 2016 heart sounds dataset. Based on characteristics of lung and heart sounds, Wavelet Decomposition is applied first to perform noise reduction, then two time-frequency feature extraction techniques, which are Short Time Fourier Transform (STFT) and Mel Frequency Cepstral Coefficients (MFCCs), are chosen to extract preliminary features of sounds and transform sounds data to spectrograms that are easy to analyze. Our LungHeart-AtMe model is improved by introducing MMoE structure and by using the attention mechanism-based CNN model to extend its global feature extraction capability. From our experimental result, LungHeart-AtMe has achieved a Sensitivity of 71.55% and a Specificity of 28.06% for cardiopulmonary sounds classification.
{"title":"LungHeart-AtMe: Adventitious Cardiopulmonary Sounds Classification Using MMoE with STFT and MFCCs Spectrograms","authors":"Changyan Chen†, Qing Zhang, Shirui Sheng, Huajie Huang, Yuhang Zhang, Yongfu Li","doi":"10.1109/AICAS57966.2023.10168624","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168624","url":null,"abstract":"Adventitious cardiopulmonary (lung and heart) sound detection and classification through a digital stethoscope plays a vital role in early diagnosis and telehealth services. However, automatically detecting the adventitious sounds is challenging since they are easily susceptible to each other’s influence and noises. In this paper, for the first time, we simultaneously classify adventitious lung and heart sounds using our proposed LungHeart-AtMe model based on a mixed dataset of the ICBHI 2017 lung sounds dataset and the PhysioNet 2016 heart sounds dataset. Based on characteristics of lung and heart sounds, Wavelet Decomposition is applied first to perform noise reduction, then two time-frequency feature extraction techniques, which are Short Time Fourier Transform (STFT) and Mel Frequency Cepstral Coefficients (MFCCs), are chosen to extract preliminary features of sounds and transform sounds data to spectrograms that are easy to analyze. Our LungHeart-AtMe model is improved by introducing MMoE structure and by using the attention mechanism-based CNN model to extend its global feature extraction capability. From our experimental result, LungHeart-AtMe has achieved a Sensitivity of 71.55% and a Specificity of 28.06% for cardiopulmonary sounds classification.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129622406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168562
Linyu Zhu, Yue Gu, Xinfei Guo
As interconnect delay becomes more dominate in a timing path compared to the gate delay, accurate yet fast estimation of wire delay during the signoff stage is required. Prior machine learning-based wire delay estimation approaches either relied on tedious feature extraction processes or failed to capture the net topology information, incurring long turn around time. In this paper, we propose to leverage the power of graph neural networks (GNN) to estimate the interconnect delays during signoff. Different from other GNN-assisted timing analysis methods that were usually applied to a netlist, we harness the global message passing graph representation learning on RC graph directly to perform ultra-fast net delay estimation without requiring extra features. Furthermore, pre-processed graph features can be added to boost the estimation accuracy with slight run time penalty. Our proposed customized GNN models have been evaluated with the industrial design and compared against state of the art ML-based wire delay estimator. It shows that the proposed model outperforms the state-of-the-art ML-based signoff wire delay estimator by 4x in terms of run time while achieving similar accuracy levels.
{"title":"RC-GNN: Fast and Accurate Signoff Wire Delay Estimation with Customized Graph Neural Networks","authors":"Linyu Zhu, Yue Gu, Xinfei Guo","doi":"10.1109/AICAS57966.2023.10168562","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168562","url":null,"abstract":"As interconnect delay becomes more dominate in a timing path compared to the gate delay, accurate yet fast estimation of wire delay during the signoff stage is required. Prior machine learning-based wire delay estimation approaches either relied on tedious feature extraction processes or failed to capture the net topology information, incurring long turn around time. In this paper, we propose to leverage the power of graph neural networks (GNN) to estimate the interconnect delays during signoff. Different from other GNN-assisted timing analysis methods that were usually applied to a netlist, we harness the global message passing graph representation learning on RC graph directly to perform ultra-fast net delay estimation without requiring extra features. Furthermore, pre-processed graph features can be added to boost the estimation accuracy with slight run time penalty. Our proposed customized GNN models have been evaluated with the industrial design and compared against state of the art ML-based wire delay estimator. It shows that the proposed model outperforms the state-of-the-art ML-based signoff wire delay estimator by 4x in terms of run time while achieving similar accuracy levels.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125176141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168604
Yuandong Li, Li Du, Yuan Du
Computing in Memory (CiM), as a computing system with non-von Neumann architecture, has been reported as one of the most promising neural network accelerators in the future. Compared with digital-based computation, CiM uses RAM arrays to calculate and store in the analog domain, avoiding the high delay and energy consumption caused by data transfer. However, the computational results require data converters for quantization, which often limits the development of high-performance CiMs. In this work, we propose a 2-8bit reconfigurable time-interleaved hybrid ADC architecture for high-speed CiMs, including successive approximation and single-slope stages. Reconfigurability introduces a trade-off between resolution and conversion speed for ADCs in different computing scenarios. A prototype was implemented in a 55 nm CMOS technology, which occupies an area of 330μm × 13μm and consumes a power of 1.429mW at 8-bit conversion mode. With a Nyquist frequency input sampled at 350 MS/s, the SNDR and SFDR are 40.93 dB and 51.08 dB, respectively. The resultant Walden figure of merit is 44.8 fJ/conv.
{"title":"A Column-Parallel Time-Interleaved SAR/SS ADC for Computing in Memory with 2-8bit Reconfigurable Resolution","authors":"Yuandong Li, Li Du, Yuan Du","doi":"10.1109/AICAS57966.2023.10168604","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168604","url":null,"abstract":"Computing in Memory (CiM), as a computing system with non-von Neumann architecture, has been reported as one of the most promising neural network accelerators in the future. Compared with digital-based computation, CiM uses RAM arrays to calculate and store in the analog domain, avoiding the high delay and energy consumption caused by data transfer. However, the computational results require data converters for quantization, which often limits the development of high-performance CiMs. In this work, we propose a 2-8bit reconfigurable time-interleaved hybrid ADC architecture for high-speed CiMs, including successive approximation and single-slope stages. Reconfigurability introduces a trade-off between resolution and conversion speed for ADCs in different computing scenarios. A prototype was implemented in a 55 nm CMOS technology, which occupies an area of 330μm × 13μm and consumes a power of 1.429mW at 8-bit conversion mode. With a Nyquist frequency input sampled at 350 MS/s, the SNDR and SFDR are 40.93 dB and 51.08 dB, respectively. The resultant Walden figure of merit is 44.8 fJ/conv.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123237127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168658
Seungyong Lee, Geonu Yun, Hyuk-Jae Lee
Recently, a large number of parameters in Transformer-based language models have caused memory short-ages during training. Although solutions such as mixed precision and model parallelism have been proposed, they have the limitation of inducing communication overhead and requiring modification of the model by a programmer. To address this issue, we propose a scheme that compresses activation data in memory, enabling the reduction of memory usage during training in a user-transparent manner. The compression algorithm gathers activation data into a block and compresses it, using base-delta compression for the exponent and bit-plane zero compression for the sign and mantissa. Then, the important bits are arranged in order, and LSB truncation is applied to fit the target size. The proposed compression algorithm achieves a compression ratio of 2.09 for the sign, 2.04 for the exponent, and 1.21 for the mantissa. A compression ratio of 3.2 is obtained by applying up to the truncation, and we confirm the convergence of GPT-2 training with the compression.
{"title":"In-memory Activation Compression for GPT Training","authors":"Seungyong Lee, Geonu Yun, Hyuk-Jae Lee","doi":"10.1109/AICAS57966.2023.10168658","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168658","url":null,"abstract":"Recently, a large number of parameters in Transformer-based language models have caused memory short-ages during training. Although solutions such as mixed precision and model parallelism have been proposed, they have the limitation of inducing communication overhead and requiring modification of the model by a programmer. To address this issue, we propose a scheme that compresses activation data in memory, enabling the reduction of memory usage during training in a user-transparent manner. The compression algorithm gathers activation data into a block and compresses it, using base-delta compression for the exponent and bit-plane zero compression for the sign and mantissa. Then, the important bits are arranged in order, and LSB truncation is applied to fit the target size. The proposed compression algorithm achieves a compression ratio of 2.09 for the sign, 2.04 for the exponent, and 1.21 for the mantissa. A compression ratio of 3.2 is obtained by applying up to the truncation, and we confirm the convergence of GPT-2 training with the compression.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"509 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115892716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168646
Kasem Khalil, Ashok Kumar V, M. Bayoumi
Convolutional Neural Network (CNN) accelerator is highly beneficial for mobile and resource-constrained devices. One of the research challenges is to design a power-economic accelerator. This paper proposes a CNN accelerator with low power consumption and acceptable performance. The proposed method uses pipelining between the used kernels for the convolution process and a shared multiplication and accumulation block. The available kernels work consequently while each one performs a different operation in sequence. The proposed method utilizes a series of operations between the kernels and memory weights to speed up the convolution process. The proposed accelerator is implemented using VHDL and FPGA Altera Arria 10 GX. The results show that the proposed method achieves 26.37 GOPS/W of energy consumption, which is lower than the existing method, with acceptable resource usage and performance. The proposed method is ideally suited for small and constrained devices.
卷积神经网络(CNN)加速器对移动设备和资源受限设备非常有利。研究的挑战之一是设计一个功率经济加速器。本文提出了一种低功耗、性能可接受的CNN加速器。该方法在卷积过程中使用核之间的流水线和共享乘法和累积块。当每个内核按顺序执行不同的操作时,可用的内核就会工作。该方法利用核和内存权值之间的一系列操作来加快卷积过程。该加速器采用VHDL和FPGA Altera Arria 10gx实现。结果表明,所提方法能耗达到26.37 GOPS/W,比现有方法低,且具有可接受的资源利用率和性能。所提出的方法非常适合小型和受限的设备。
{"title":"Low-Power Convolutional Neural Network Accelerator on FPGA","authors":"Kasem Khalil, Ashok Kumar V, M. Bayoumi","doi":"10.1109/AICAS57966.2023.10168646","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168646","url":null,"abstract":"Convolutional Neural Network (CNN) accelerator is highly beneficial for mobile and resource-constrained devices. One of the research challenges is to design a power-economic accelerator. This paper proposes a CNN accelerator with low power consumption and acceptable performance. The proposed method uses pipelining between the used kernels for the convolution process and a shared multiplication and accumulation block. The available kernels work consequently while each one performs a different operation in sequence. The proposed method utilizes a series of operations between the kernels and memory weights to speed up the convolution process. The proposed accelerator is implemented using VHDL and FPGA Altera Arria 10 GX. The results show that the proposed method achieves 26.37 GOPS/W of energy consumption, which is lower than the existing method, with acceptable resource usage and performance. The proposed method is ideally suited for small and constrained devices.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123555096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168645
Jiahao Liu, Xinyu Liu, Liang Zhou, L. Chang, Jun Zhou
Atrial fibrillation (AF) is a prevalent cardiovascular disease in the elderly, significantly increasing the risk of stroke and heart failure, etc. While the artificial neural network (ANN) has recently demonstrated high accuracy in ECG-based AF detection, its high computation complexity makes it challenging for real-time and long-term monitoring on low-power wearable devices, which is critical for detecting paroxysmal AF. Therefore, in this work, a lightweight convolutional neural network for AF detection is proposed using a dual-channel binary features extraction technique from single-lead short ECG to achieve both high classification accuracy and low computation complexity, and evaluated on the 2017 PhysioNet/CinC Challenge dataset, the proposed method achieves 93.6% sensitivity and 0.81 F1 score for AF detection. Moreover, this design consumes only 1.83M parameters, achieving up to 27x reductions compared with prior works, and only needs 57M MACs for calculation. As a result, it is suitable for deployment in low-power wearable devices for long-term AF monitoring.
{"title":"A Lightweight Convolutional Neural Network for Atrial Fibrillation Detection Using Dual-Channel Binary Features from Single-Lead Short ECG","authors":"Jiahao Liu, Xinyu Liu, Liang Zhou, L. Chang, Jun Zhou","doi":"10.1109/AICAS57966.2023.10168645","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168645","url":null,"abstract":"Atrial fibrillation (AF) is a prevalent cardiovascular disease in the elderly, significantly increasing the risk of stroke and heart failure, etc. While the artificial neural network (ANN) has recently demonstrated high accuracy in ECG-based AF detection, its high computation complexity makes it challenging for real-time and long-term monitoring on low-power wearable devices, which is critical for detecting paroxysmal AF. Therefore, in this work, a lightweight convolutional neural network for AF detection is proposed using a dual-channel binary features extraction technique from single-lead short ECG to achieve both high classification accuracy and low computation complexity, and evaluated on the 2017 PhysioNet/CinC Challenge dataset, the proposed method achieves 93.6% sensitivity and 0.81 F1 score for AF detection. Moreover, this design consumes only 1.83M parameters, achieving up to 27x reductions compared with prior works, and only needs 57M MACs for calculation. As a result, it is suitable for deployment in low-power wearable devices for long-term AF monitoring.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121807942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168599
D. Kushwaha, Rajat Kohli, Jwalant Mishra, R. Joshi, S. Dasgupta, B. Anand
A robust, fully differential multiplication and accumulate (MAC) scheme for analog compute-in-memory (CIM) architecture is proposed in this article. The proposed method achieves a high signal margin for 4-bit CIM architecture due to fully differential voltage changes on read bit-lines (RBL/RBLBs). The signal margin achieved for 4-bit MAC operation is 32 mV, which is 1.14×, 5.82×, and 10.24× higher than the state-of-the-art. The proposed scheme is robust against the process, voltage, and temperature (PVT) variations and achieves a variability metric (σ/µ) of 3.64 %, which is 2.36× and 2.66× lower than the reported works. The architecture has achieved an energy-efficiency of 2.53 TOPS/W at 1 V supply voltage in 65 nm CMOS technology, that is 6.2× efficient than digital baseline HW [25]. Furthermore, the inference accuracy of the architecture is 97.6% on the MNIST data set with a LeNet-5 CNN model. The figure-of-merit (FoM) of the proposed design is 355, which is 3.28×, 3.58×, and 17.75× higher than state-of-the-art.
{"title":"A Fully Differential 4-Bit Analog Compute-In-Memory Architecture for Inference Application","authors":"D. Kushwaha, Rajat Kohli, Jwalant Mishra, R. Joshi, S. Dasgupta, B. Anand","doi":"10.1109/AICAS57966.2023.10168599","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168599","url":null,"abstract":"A robust, fully differential multiplication and accumulate (MAC) scheme for analog compute-in-memory (CIM) architecture is proposed in this article. The proposed method achieves a high signal margin for 4-bit CIM architecture due to fully differential voltage changes on read bit-lines (RBL/RBLBs). The signal margin achieved for 4-bit MAC operation is 32 mV, which is 1.14×, 5.82×, and 10.24× higher than the state-of-the-art. The proposed scheme is robust against the process, voltage, and temperature (PVT) variations and achieves a variability metric (σ/µ) of 3.64 %, which is 2.36× and 2.66× lower than the reported works. The architecture has achieved an energy-efficiency of 2.53 TOPS/W at 1 V supply voltage in 65 nm CMOS technology, that is 6.2× efficient than digital baseline HW [25]. Furthermore, the inference accuracy of the architecture is 97.6% on the MNIST data set with a LeNet-5 CNN model. The figure-of-merit (FoM) of the proposed design is 355, which is 3.28×, 3.58×, and 17.75× higher than state-of-the-art.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114512651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}