Pub Date : 2024-08-21DOI: 10.1109/TBCAS.2024.3437554
{"title":"IEEE Circuits and Systems Society Information","authors":"","doi":"10.1109/TBCAS.2024.3437554","DOIUrl":"https://doi.org/10.1109/TBCAS.2024.3437554","url":null,"abstract":"","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"18 4","pages":"C3-C3"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10643420","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142021656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-21DOI: 10.1109/TBCAS.2024.3439815
{"title":"TechRxiv: Share Your Preprint Research with the World!","authors":"","doi":"10.1109/TBCAS.2024.3439815","DOIUrl":"https://doi.org/10.1109/TBCAS.2024.3439815","url":null,"abstract":"","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"18 4","pages":"951-951"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10643423","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142021664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-20DOI: 10.1109/TBCAS.2024.3446660
Deyu Wang, Xiaoze Yan, Yu Yang, Dimitrios Stathis, Ahmed Hemani, Anders Lansner, Jiawei Xu, Li-Rong Zheng, Zhuo Zou
Associative memory is a cornerstone of cognitive intelligence within the human brain. The Bayesian confidence propagation neural network (BCPNN), a cortex-inspired model with high biological plausibility, has proven effective in emulating high-level cognitive functions like associative memory. However, the current approach using GPUs to simulate BCPNN-based associative memory tasks encounters challenges in latency and power efficiency as the model size scales. This work proposes a scalable multi-FPGA high performance computing (HPC) architecture designed for the associative memory system. The architecture integrates a set of hypercolumn unit (HCU) computing cores for intra-board online learning and inference, along with a spike-based synchronization scheme for inter-board communication among multiple FPGAs. Several design strategies, including population-based model mapping, packet-based spike synchronization, and cluster-based timing optimization, are presented to facilitate the multi-FPGA implementation. The architecture is implemented and validated on two Xilinx Alveo U50 FPGA cards, achieving a maximum model size of 200×10 and a peak working frequency of 220 MHz for the associative memory system. Both the memory-bounded spatial scalability and compute-bounded temporal scalability of the architecture are evaluated and optimized, achieving a maximum scale-latency ratio (SLR) of 268.82 for the two-FPGA implementation. Compared to a two-GPU counterpart, the two-FPGA approach demonstrates a maximum latency reduction of 51.72× and a power reduction exceeding 5.28× under the same network configuration. Compared with the state-of-the-art works, the two-FPGA implementation exhibits a high pattern storage capacity for the associative memory task.
{"title":"Scalable Multi-FPGA HPC Architecture for Associative Memory System.","authors":"Deyu Wang, Xiaoze Yan, Yu Yang, Dimitrios Stathis, Ahmed Hemani, Anders Lansner, Jiawei Xu, Li-Rong Zheng, Zhuo Zou","doi":"10.1109/TBCAS.2024.3446660","DOIUrl":"https://doi.org/10.1109/TBCAS.2024.3446660","url":null,"abstract":"<p><p>Associative memory is a cornerstone of cognitive intelligence within the human brain. The Bayesian confidence propagation neural network (BCPNN), a cortex-inspired model with high biological plausibility, has proven effective in emulating high-level cognitive functions like associative memory. However, the current approach using GPUs to simulate BCPNN-based associative memory tasks encounters challenges in latency and power efficiency as the model size scales. This work proposes a scalable multi-FPGA high performance computing (HPC) architecture designed for the associative memory system. The architecture integrates a set of hypercolumn unit (HCU) computing cores for intra-board online learning and inference, along with a spike-based synchronization scheme for inter-board communication among multiple FPGAs. Several design strategies, including population-based model mapping, packet-based spike synchronization, and cluster-based timing optimization, are presented to facilitate the multi-FPGA implementation. The architecture is implemented and validated on two Xilinx Alveo U50 FPGA cards, achieving a maximum model size of 200×10 and a peak working frequency of 220 MHz for the associative memory system. Both the memory-bounded spatial scalability and compute-bounded temporal scalability of the architecture are evaluated and optimized, achieving a maximum scale-latency ratio (SLR) of 268.82 for the two-FPGA implementation. Compared to a two-GPU counterpart, the two-FPGA approach demonstrates a maximum latency reduction of 51.72× and a power reduction exceeding 5.28× under the same network configuration. Compared with the state-of-the-art works, the two-FPGA implementation exhibits a high pattern storage capacity for the associative memory task.</p>","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142010109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-19DOI: 10.1109/TBCAS.2024.3445174
Francesco Malanga, Gennaro Fratta, Giulia Acconcia, Ivan Rech
Time-Correlated Single Photon Counting (TCSPC) is a pivotal technique in low-light-detection applications, renowned for its exceptional sensitivity and bandwidth, widely used in Fluorescence Lifetime Imaging Microscopy (FLIM) and quantum optics. Despite its features, TCSPC is significantly hindered by the pile-up effect, which may distort measurements at high photon-detection rates. Overcoming pile-up is challenging, with traditional solutions often involving complex post-processing or multichannel systems, complicating the TCSPC setup and limiting performance. A breakthrough to overcome this issue is matching the photodetector dead time to an integer multiple of the laser period, obtaining a distortionless histogram even at high illumination conditions. Building on this concept, we present an Active Quenching Circuit (AQC) developed in high-voltage 150 nm technology, achieving unprecedented control over the Single Photon Avalanche Diode (SPAD) dead time. Our design compensates for Process, Voltage, and Temperature (PVT) variations, ensuring ultra precise and robust dead time tuning. The presented AQC achieves a dead-time resolution of 50 ps suitable for time-resolved experiments within a selectable range of laser frequencies from 20 to 100 MHz, maintaining close-to- ideal linearity in dead-time control. Experimental validations through fluorescence measurements reveal a distortion as low as 0.43% under elevated count-rate conditions, highlighting the efficacy of our circuit in overcoming the pile-up limitation.
{"title":"Integrated Active Quenching Circuit for high-rate and distortionless SPAD-based time-resolved fluorescence applications.","authors":"Francesco Malanga, Gennaro Fratta, Giulia Acconcia, Ivan Rech","doi":"10.1109/TBCAS.2024.3445174","DOIUrl":"https://doi.org/10.1109/TBCAS.2024.3445174","url":null,"abstract":"<p><p>Time-Correlated Single Photon Counting (TCSPC) is a pivotal technique in low-light-detection applications, renowned for its exceptional sensitivity and bandwidth, widely used in Fluorescence Lifetime Imaging Microscopy (FLIM) and quantum optics. Despite its features, TCSPC is significantly hindered by the pile-up effect, which may distort measurements at high photon-detection rates. Overcoming pile-up is challenging, with traditional solutions often involving complex post-processing or multichannel systems, complicating the TCSPC setup and limiting performance. A breakthrough to overcome this issue is matching the photodetector dead time to an integer multiple of the laser period, obtaining a distortionless histogram even at high illumination conditions. Building on this concept, we present an Active Quenching Circuit (AQC) developed in high-voltage 150 nm technology, achieving unprecedented control over the Single Photon Avalanche Diode (SPAD) dead time. Our design compensates for Process, Voltage, and Temperature (PVT) variations, ensuring ultra precise and robust dead time tuning. The presented AQC achieves a dead-time resolution of 50 ps suitable for time-resolved experiments within a selectable range of laser frequencies from 20 to 100 MHz, maintaining close-to- ideal linearity in dead-time control. Experimental validations through fluorescence measurements reveal a distortion as low as 0.43% under elevated count-rate conditions, highlighting the efficacy of our circuit in overcoming the pile-up limitation.</p>","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-19DOI: 10.1109/TBCAS.2024.3446177
Kalkidan Deme Muleta, Bai-Sun Kong
The spiking neural network (SNN) training with spike timing-dependent plasticity (STDP) for image classification usually requires a lot of neurons to extract representative features and(or) needs an external classifier. Conventional bio-inspired learning methods do not cover all possible learning opportunities, resulting in limited performance. We propose a new bio-plausible learning rule, target-modulated STDP (TSTDP), for higher learning efficiency and accuracy. We also propose an SNN architecture trainable with TSTDP using temporally encoded spikes to obtain higher accuracy and improved area efficiency without using an external classifier. Using the MNIST dataset, we have shown that the proposed design achieves an accuracy of 92%, which is up to 7% improvement compared to conventional networks of similar sizes. For providing similar accuracy, up to 75% smaller network size has been shown on top of demonstrating stronger resilience to process variations. Benchmarking on the CIFAR-10 and neuromorphic DVS gesture datasets show an accuracy improvement of up to 12.4% and 3.6%, respectively.
{"title":"RRAM-Based Spiking Neural Network with Target-Modulated Spike-Timing-Dependent Plasticity.","authors":"Kalkidan Deme Muleta, Bai-Sun Kong","doi":"10.1109/TBCAS.2024.3446177","DOIUrl":"https://doi.org/10.1109/TBCAS.2024.3446177","url":null,"abstract":"<p><p>The spiking neural network (SNN) training with spike timing-dependent plasticity (STDP) for image classification usually requires a lot of neurons to extract representative features and(or) needs an external classifier. Conventional bio-inspired learning methods do not cover all possible learning opportunities, resulting in limited performance. We propose a new bio-plausible learning rule, target-modulated STDP (TSTDP), for higher learning efficiency and accuracy. We also propose an SNN architecture trainable with TSTDP using temporally encoded spikes to obtain higher accuracy and improved area efficiency without using an external classifier. Using the MNIST dataset, we have shown that the proposed design achieves an accuracy of 92%, which is up to 7% improvement compared to conventional networks of similar sizes. For providing similar accuracy, up to 75% smaller network size has been shown on top of demonstrating stronger resilience to process variations. Benchmarking on the CIFAR-10 and neuromorphic DVS gesture datasets show an accuracy improvement of up to 12.4% and 3.6%, respectively.</p>","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-19DOI: 10.1109/TBCAS.2024.3445968
Yi-Lin Lo, Yu-Chen Lo, Chia-Hsiang Yang
Hand-held ultrasound devices have been widely used in the field of healthcare and power-efficient, real-time imaging is essential. This work presents the world's first ultrasound imaging processor supporting advanced modes, including vector flow imaging and elastography imaging. Plane-wave beamforming is utilized to ensure that the pulse repetition frequency (PRF) is sufficiently high for the advanced mode. The storage size and power consumption are minimized through algorithm-architecture co-optimization. The proposed plane-wave beamforming reduces the storage size of the required delay values by 43.7%. By exchanging the processing order, the storage size is reduced by 78.1% for elastography imaging. Parallel beamforming and interleaved firing are employed to achieve real-time imaging for all the supported modes. Fabricated in 40-nm CMOS technology, the proposed processor integrates 4.7M logic gates in core area of 3.24mm2. This work achieves a 20.3× higher beamforming rate with 5.3-to-29.1× lower power consumption than the state-of- the-art design. It also has 60% lower hardware complexity (in terms of gate count), in addition to the capability for supporting the advanced mode.
{"title":"A 40-nm 169mW Ultrasound Imaging Processor Supporting Advanced Modes for Hand-Held Devices.","authors":"Yi-Lin Lo, Yu-Chen Lo, Chia-Hsiang Yang","doi":"10.1109/TBCAS.2024.3445968","DOIUrl":"https://doi.org/10.1109/TBCAS.2024.3445968","url":null,"abstract":"<p><p>Hand-held ultrasound devices have been widely used in the field of healthcare and power-efficient, real-time imaging is essential. This work presents the world's first ultrasound imaging processor supporting advanced modes, including vector flow imaging and elastography imaging. Plane-wave beamforming is utilized to ensure that the pulse repetition frequency (PRF) is sufficiently high for the advanced mode. The storage size and power consumption are minimized through algorithm-architecture co-optimization. The proposed plane-wave beamforming reduces the storage size of the required delay values by 43.7%. By exchanging the processing order, the storage size is reduced by 78.1% for elastography imaging. Parallel beamforming and interleaved firing are employed to achieve real-time imaging for all the supported modes. Fabricated in 40-nm CMOS technology, the proposed processor integrates 4.7M logic gates in core area of 3.24mm<sup>2</sup>. This work achieves a 20.3× higher beamforming rate with 5.3-to-29.1× lower power consumption than the state-of- the-art design. It also has 60% lower hardware complexity (in terms of gate count), in addition to the capability for supporting the advanced mode.</p>","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-13DOI: 10.1109/TBCAS.2024.3442250
Shuenn-Yuh Lee, Ming-Yueh Ku, Yen-Hsing Tsai, Chou-Ching Lin
Epilepsy is a globally distributed chronic neurological disorder that may pose a threat to life without warning. Therefore, the use of wearable devices for real-time detection and treatment of epilepsy is crucial. Additionally, personalizing disease detection algorithms for individual users is also a challenge in clinical applications. Some studies have proposed seizure detection algorithms with convolutional neural networks (CNNs) and programmable hardware architectures for speeding up the process of CNN inference. However, personalizing seizure detection algorithms could still not be performed on these hardware architectures. Consequently, this study proposes three key contributions to address the challenges: a real-time seizure detection and personalization algorithm, a programmable reduced instruction set computer-V (RISC-V) deep learning accelerator (DLA) hardware architecture (RVDLAHA), and a dedicated RISC-V DLA (RVDLA) compiler. In animal experiments with lab rats, the proposed CNN-based seizure detection algorithm obtains an accuracy of 99.5% for a 32-bit floating point and an accuracy of 99.3% for a 16-bit fixed point. Additionally, the proposed personalization algorithm increases the testing accuracy across different databases from 85.0% to 92.9%. The RVDLAHA is implemented on Xilinx PYNQ-Z2, with a power consumption of only 0.107 W at an operating frequency of 1 MHz. Each step, including raw data input, preprocessing, detection, and personalization, requires only 17.8, 1.0, 1.1, and 1.3 ms, respectively. With the hardware architecture, the seizure detection and personalization algorithm can provide on-device real-time monitoring.
{"title":"RVDLAHA: An RISC-V DLA Hardware Architecture for On-Device Real-Time Seizure Detection and Personalization in Wearable Applications.","authors":"Shuenn-Yuh Lee, Ming-Yueh Ku, Yen-Hsing Tsai, Chou-Ching Lin","doi":"10.1109/TBCAS.2024.3442250","DOIUrl":"https://doi.org/10.1109/TBCAS.2024.3442250","url":null,"abstract":"<p><p>Epilepsy is a globally distributed chronic neurological disorder that may pose a threat to life without warning. Therefore, the use of wearable devices for real-time detection and treatment of epilepsy is crucial. Additionally, personalizing disease detection algorithms for individual users is also a challenge in clinical applications. Some studies have proposed seizure detection algorithms with convolutional neural networks (CNNs) and programmable hardware architectures for speeding up the process of CNN inference. However, personalizing seizure detection algorithms could still not be performed on these hardware architectures. Consequently, this study proposes three key contributions to address the challenges: a real-time seizure detection and personalization algorithm, a programmable reduced instruction set computer-V (RISC-V) deep learning accelerator (DLA) hardware architecture (RVDLAHA), and a dedicated RISC-V DLA (RVDLA) compiler. In animal experiments with lab rats, the proposed CNN-based seizure detection algorithm obtains an accuracy of 99.5% for a 32-bit floating point and an accuracy of 99.3% for a 16-bit fixed point. Additionally, the proposed personalization algorithm increases the testing accuracy across different databases from 85.0% to 92.9%. The RVDLAHA is implemented on Xilinx PYNQ-Z2, with a power consumption of only 0.107 W at an operating frequency of 1 MHz. Each step, including raw data input, preprocessing, detection, and personalization, requires only 17.8, 1.0, 1.1, and 1.3 ms, respectively. With the hardware architecture, the seizure detection and personalization algorithm can provide on-device real-time monitoring.</p>","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141977492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-07DOI: 10.1109/TBCAS.2024.3439619
Guanjie Gu, Changgui Yang, Jian Zhao, Sijun Du, Yuxuan Luo, Bo Zhao
Body Channel Communication (BCC) utilizes the body surface as a low-loss signal transmission medium, reducing the power consumption of wireless wearable devices. However, the effective communication range on the human body is limited in the state-of-the-art BCC transceivers, where the signal loss between the body surface and the BCC receiver remains one of the main bottlenecks. To reduce the interface loss, a high input impedance is desired by the BCC receiver, but the DC-biasing circuits decrease the input impedance. In this work, a dynamically-sampling IFE is proposed to eliminate the DC voltage bias, resulting in a 90kΩ high input impedance and a 94dB RF-IF conversion gain to reduce the interface loss in long-range BCC applications. The BCC transceiver chip is fabricated in 55nm CMOS process, taking a die area of 0.123mm2. Measured results show that the chip extends the BCC range to 2m for both the forward and backward paths, where the transmitter and receiver consume 711μW power in total.
{"title":"A 2m-Range 711μW Body Channel Communication Transceiver Featuring Dynamically-Sampling Bias-Free Interface Front End.","authors":"Guanjie Gu, Changgui Yang, Jian Zhao, Sijun Du, Yuxuan Luo, Bo Zhao","doi":"10.1109/TBCAS.2024.3439619","DOIUrl":"https://doi.org/10.1109/TBCAS.2024.3439619","url":null,"abstract":"<p><p>Body Channel Communication (BCC) utilizes the body surface as a low-loss signal transmission medium, reducing the power consumption of wireless wearable devices. However, the effective communication range on the human body is limited in the state-of-the-art BCC transceivers, where the signal loss between the body surface and the BCC receiver remains one of the main bottlenecks. To reduce the interface loss, a high input impedance is desired by the BCC receiver, but the DC-biasing circuits decrease the input impedance. In this work, a dynamically-sampling IFE is proposed to eliminate the DC voltage bias, resulting in a 90kΩ high input impedance and a 94dB RF-IF conversion gain to reduce the interface loss in long-range BCC applications. The BCC transceiver chip is fabricated in 55nm CMOS process, taking a die area of 0.123mm<sup>2</sup>. Measured results show that the chip extends the BCC range to 2m for both the forward and backward paths, where the transmitter and receiver consume 711μW power in total.</p>","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141903948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1109/TBCAS.2024.3436837
Qi Cheng, Xiaofang Hu, He Xiao, Yue Zhou, Shukai Duan
In recent years, The combination of Attention mechanism and deep learning has a wide range of applications in the field of medical imaging. However, due to its complex computational processes, existing hardware architectures have high resource consumption or low accuracy, and deploying them efficiently to DNN accelerators is a challenge. This paper proposes an online-programmable Attention hardware architecture based on compute-in-memory (CIM) marco, which reduces the complexity of Attention in hardware and improves integration density, energy efficiency, and calculation accuracy. First, the Attention computation process is decomposed into multiple cascaded combinatorial matrix operations to reduce the complexity of its implementation on the hardware side; second, in order to reduce the influence of the non-ideal characteristics of the hardware, an online-programmable CIM architecture is designed to improve calculation accuracy by dynamically adjusting the weights; and lastly, it is verified that the proposed Attention hardware architecture can be applied for the inference of deep neural networks through Spice simulation. Based on the 100nm CMOS process, compared with the traditional Attention hardware architectures, the integrated density and energy efficiency are increased by at least 91.38 times, and latency and computing efficiency are improved by about 12.5 times.
{"title":"High-Performance Method and Architecture for Attention Computation in DNN Inference.","authors":"Qi Cheng, Xiaofang Hu, He Xiao, Yue Zhou, Shukai Duan","doi":"10.1109/TBCAS.2024.3436837","DOIUrl":"10.1109/TBCAS.2024.3436837","url":null,"abstract":"<p><p>In recent years, The combination of Attention mechanism and deep learning has a wide range of applications in the field of medical imaging. However, due to its complex computational processes, existing hardware architectures have high resource consumption or low accuracy, and deploying them efficiently to DNN accelerators is a challenge. This paper proposes an online-programmable Attention hardware architecture based on compute-in-memory (CIM) marco, which reduces the complexity of Attention in hardware and improves integration density, energy efficiency, and calculation accuracy. First, the Attention computation process is decomposed into multiple cascaded combinatorial matrix operations to reduce the complexity of its implementation on the hardware side; second, in order to reduce the influence of the non-ideal characteristics of the hardware, an online-programmable CIM architecture is designed to improve calculation accuracy by dynamically adjusting the weights; and lastly, it is verified that the proposed Attention hardware architecture can be applied for the inference of deep neural networks through Spice simulation. Based on the 100nm CMOS process, compared with the traditional Attention hardware architectures, the integrated density and energy efficiency are increased by at least 91.38 times, and latency and computing efficiency are improved by about 12.5 times.</p>","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141876988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-30DOI: 10.1109/TBCAS.2024.3435718
Shuenn-Yuh Lee, Ming-Yueh Ku, Wei-Cheng Tseng, Ju-Yi Chen
This work proposes a classification system for arrhythmias, aiming to enhance the efficiency of the diagnostic process for cardiologists. The proposed algorithm includes a naive preprocessing procedure for electrocardiography (ECG) data applicable to various ECG databases. Additionally, this work proposes an ultralightweight model for arrhythmia classification based on a convolutional neural network and incorporating R-peak interval features to represent long-term rhythm information, thereby improving the model's classification performance. The proposed model is trained and tested by using the MIT-BIH and NCKU-CBIC databases in accordance with the classification standards of the Association for the Advancement of Medical Instrumentation (AAMI), achieving high accuracies of 98.32% and 97.1%. This work applies the arrhythmia classification algorithm to a web-based system, thus providing a graphical interface. The cloud-based execution of automated artificial intelligence (AI) classification allows cardiologists and patients to view ECG wave conditions instantly, thereby remarkably enhancing the quality of medical examination. This work also designs a customized integrated circuit for the hardware implementation of an AI accelerator. The accelerator utilizes a parallelized processing element array architecture to perform convolution and fully connected layer operations. It introduces proposed hybrid stationary techniques, combining input and weight stationary modes to increase data reuse drastically and reduce hardware execution cycles and power consumption, ultimately achieving high-performance computing. This accelerator is implemented in the form of a chip by using the TSMC 180 nm CMOS process. It exhibits a power consumption of 122 μW, a classification latency of 6.8 ms, and an energy efficiency of 0.83 μJ/classification.
{"title":"AI Accelerator with Ultralightweight Time-Period CNN-Based Model for Arrhythmia Classification.","authors":"Shuenn-Yuh Lee, Ming-Yueh Ku, Wei-Cheng Tseng, Ju-Yi Chen","doi":"10.1109/TBCAS.2024.3435718","DOIUrl":"https://doi.org/10.1109/TBCAS.2024.3435718","url":null,"abstract":"<p><p>This work proposes a classification system for arrhythmias, aiming to enhance the efficiency of the diagnostic process for cardiologists. The proposed algorithm includes a naive preprocessing procedure for electrocardiography (ECG) data applicable to various ECG databases. Additionally, this work proposes an ultralightweight model for arrhythmia classification based on a convolutional neural network and incorporating R-peak interval features to represent long-term rhythm information, thereby improving the model's classification performance. The proposed model is trained and tested by using the MIT-BIH and NCKU-CBIC databases in accordance with the classification standards of the Association for the Advancement of Medical Instrumentation (AAMI), achieving high accuracies of 98.32% and 97.1%. This work applies the arrhythmia classification algorithm to a web-based system, thus providing a graphical interface. The cloud-based execution of automated artificial intelligence (AI) classification allows cardiologists and patients to view ECG wave conditions instantly, thereby remarkably enhancing the quality of medical examination. This work also designs a customized integrated circuit for the hardware implementation of an AI accelerator. The accelerator utilizes a parallelized processing element array architecture to perform convolution and fully connected layer operations. It introduces proposed hybrid stationary techniques, combining input and weight stationary modes to increase data reuse drastically and reduce hardware execution cycles and power consumption, ultimately achieving high-performance computing. This accelerator is implemented in the form of a chip by using the TSMC 180 nm CMOS process. It exhibits a power consumption of 122 μW, a classification latency of 6.8 ms, and an energy efficiency of 0.83 μJ/classification.</p>","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141857446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}