首页 > 最新文献

IEEE transactions on biomedical circuits and systems最新文献

英文 中文
IEEE Circuits and Systems Society Information 电气和电子工程师学会电路与系统协会信息
Pub Date : 2024-08-21 DOI: 10.1109/TBCAS.2024.3437554
{"title":"IEEE Circuits and Systems Society Information","authors":"","doi":"10.1109/TBCAS.2024.3437554","DOIUrl":"https://doi.org/10.1109/TBCAS.2024.3437554","url":null,"abstract":"","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"18 4","pages":"C3-C3"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10643420","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142021656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TechRxiv: Share Your Preprint Research with the World! TechRxiv:与世界分享您的预印本研究成果!
Pub Date : 2024-08-21 DOI: 10.1109/TBCAS.2024.3439815
{"title":"TechRxiv: Share Your Preprint Research with the World!","authors":"","doi":"10.1109/TBCAS.2024.3439815","DOIUrl":"https://doi.org/10.1109/TBCAS.2024.3439815","url":null,"abstract":"","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"18 4","pages":"951-951"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10643423","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142021664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable Multi-FPGA HPC Architecture for Associative Memory System. 联想存储器系统的可扩展多 FPGA HPC 架构。
Pub Date : 2024-08-20 DOI: 10.1109/TBCAS.2024.3446660
Deyu Wang, Xiaoze Yan, Yu Yang, Dimitrios Stathis, Ahmed Hemani, Anders Lansner, Jiawei Xu, Li-Rong Zheng, Zhuo Zou

Associative memory is a cornerstone of cognitive intelligence within the human brain. The Bayesian confidence propagation neural network (BCPNN), a cortex-inspired model with high biological plausibility, has proven effective in emulating high-level cognitive functions like associative memory. However, the current approach using GPUs to simulate BCPNN-based associative memory tasks encounters challenges in latency and power efficiency as the model size scales. This work proposes a scalable multi-FPGA high performance computing (HPC) architecture designed for the associative memory system. The architecture integrates a set of hypercolumn unit (HCU) computing cores for intra-board online learning and inference, along with a spike-based synchronization scheme for inter-board communication among multiple FPGAs. Several design strategies, including population-based model mapping, packet-based spike synchronization, and cluster-based timing optimization, are presented to facilitate the multi-FPGA implementation. The architecture is implemented and validated on two Xilinx Alveo U50 FPGA cards, achieving a maximum model size of 200×10 and a peak working frequency of 220 MHz for the associative memory system. Both the memory-bounded spatial scalability and compute-bounded temporal scalability of the architecture are evaluated and optimized, achieving a maximum scale-latency ratio (SLR) of 268.82 for the two-FPGA implementation. Compared to a two-GPU counterpart, the two-FPGA approach demonstrates a maximum latency reduction of 51.72× and a power reduction exceeding 5.28× under the same network configuration. Compared with the state-of-the-art works, the two-FPGA implementation exhibits a high pattern storage capacity for the associative memory task.

联想记忆是人类大脑认知智能的基石。贝叶斯置信传播神经网络(BCPN)是一种受大脑皮层启发的模型,具有很高的生物学可信度,已被证明能有效模拟联想记忆等高级认知功能。然而,目前使用 GPU 模拟基于 BCPNN 的联想记忆任务的方法,随着模型规模的扩大,在延迟和能效方面遇到了挑战。本研究提出了一种专为联想记忆系统设计的可扩展多 FPGA 高性能计算(HPC)架构。该架构集成了一组用于板内在线学习和推理的超列单元(HCU)计算内核,以及用于多个 FPGA 之间板内通信的基于尖峰的同步方案。介绍了几种设计策略,包括基于群体的模型映射、基于分组的尖峰同步和基于集群的时序优化,以促进多 FPGA 的实现。该架构在两块 Xilinx Alveo U50 FPGA 卡上实现并通过验证,关联存储器系统的最大模型尺寸为 200×10,峰值工作频率为 220 MHz。评估和优化了该架构的内存约束空间可扩展性和计算约束时间可扩展性,在两个 FPGA 实现中,最大扩展延迟比 (SLR) 达到 268.82。与双 GPU 对应方案相比,在相同的网络配置下,双 FPGA 方案的最大延迟降低了 51.72 倍,功耗降低了 5.28 倍。与最先进的作品相比,双FPGA实施方案在联想存储器任务中表现出较高的模式存储能力。
{"title":"Scalable Multi-FPGA HPC Architecture for Associative Memory System.","authors":"Deyu Wang, Xiaoze Yan, Yu Yang, Dimitrios Stathis, Ahmed Hemani, Anders Lansner, Jiawei Xu, Li-Rong Zheng, Zhuo Zou","doi":"10.1109/TBCAS.2024.3446660","DOIUrl":"https://doi.org/10.1109/TBCAS.2024.3446660","url":null,"abstract":"<p><p>Associative memory is a cornerstone of cognitive intelligence within the human brain. The Bayesian confidence propagation neural network (BCPNN), a cortex-inspired model with high biological plausibility, has proven effective in emulating high-level cognitive functions like associative memory. However, the current approach using GPUs to simulate BCPNN-based associative memory tasks encounters challenges in latency and power efficiency as the model size scales. This work proposes a scalable multi-FPGA high performance computing (HPC) architecture designed for the associative memory system. The architecture integrates a set of hypercolumn unit (HCU) computing cores for intra-board online learning and inference, along with a spike-based synchronization scheme for inter-board communication among multiple FPGAs. Several design strategies, including population-based model mapping, packet-based spike synchronization, and cluster-based timing optimization, are presented to facilitate the multi-FPGA implementation. The architecture is implemented and validated on two Xilinx Alveo U50 FPGA cards, achieving a maximum model size of 200×10 and a peak working frequency of 220 MHz for the associative memory system. Both the memory-bounded spatial scalability and compute-bounded temporal scalability of the architecture are evaluated and optimized, achieving a maximum scale-latency ratio (SLR) of 268.82 for the two-FPGA implementation. Compared to a two-GPU counterpart, the two-FPGA approach demonstrates a maximum latency reduction of 51.72× and a power reduction exceeding 5.28× under the same network configuration. Compared with the state-of-the-art works, the two-FPGA implementation exhibits a high pattern storage capacity for the associative memory task.</p>","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142010109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrated Active Quenching Circuit for high-rate and distortionless SPAD-based time-resolved fluorescence applications. 集成主动淬灭电路,用于基于 SPAD 的高速率、无失真时间分辨荧光应用。
Pub Date : 2024-08-19 DOI: 10.1109/TBCAS.2024.3445174
Francesco Malanga, Gennaro Fratta, Giulia Acconcia, Ivan Rech

Time-Correlated Single Photon Counting (TCSPC) is a pivotal technique in low-light-detection applications, renowned for its exceptional sensitivity and bandwidth, widely used in Fluorescence Lifetime Imaging Microscopy (FLIM) and quantum optics. Despite its features, TCSPC is significantly hindered by the pile-up effect, which may distort measurements at high photon-detection rates. Overcoming pile-up is challenging, with traditional solutions often involving complex post-processing or multichannel systems, complicating the TCSPC setup and limiting performance. A breakthrough to overcome this issue is matching the photodetector dead time to an integer multiple of the laser period, obtaining a distortionless histogram even at high illumination conditions. Building on this concept, we present an Active Quenching Circuit (AQC) developed in high-voltage 150 nm technology, achieving unprecedented control over the Single Photon Avalanche Diode (SPAD) dead time. Our design compensates for Process, Voltage, and Temperature (PVT) variations, ensuring ultra precise and robust dead time tuning. The presented AQC achieves a dead-time resolution of 50 ps suitable for time-resolved experiments within a selectable range of laser frequencies from 20 to 100 MHz, maintaining close-to- ideal linearity in dead-time control. Experimental validations through fluorescence measurements reveal a distortion as low as 0.43% under elevated count-rate conditions, highlighting the efficacy of our circuit in overcoming the pile-up limitation.

时间相关单光子计数(TCSPC)是低照度探测应用中的一项关键技术,以其卓越的灵敏度和带宽而闻名,广泛应用于荧光寿命成像显微镜(FLIM)和量子光学。尽管 TCSPC 功能强大,但在高光子检测率下,其堆积效应可能会扭曲测量结果,从而严重阻碍了其应用。克服堆积效应具有挑战性,传统的解决方案通常涉及复杂的后处理或多通道系统,从而使 TCSPC 设置复杂化并限制了性能。克服这一问题的一个突破是将光电探测器的死区时间与激光周期的整数倍相匹配,这样即使在高照度条件下也能获得不失真直方图。基于这一概念,我们提出了采用 150 纳米高压技术开发的主动淬火电路 (AQC),实现了对单光子雪崩二极管 (SPAD) 死区时间前所未有的控制。我们的设计可补偿工艺、电压和温度(PVT)的变化,确保超精确和稳健的死区时间调整。所推出的 AQC 可实现 50 ps 的死区时间分辨率,适合在 20 至 100 MHz 的可选激光频率范围内进行时间分辨实验,并在死区时间控制中保持接近理想的线性度。通过荧光测量进行的实验验证表明,在计数率较高的条件下,失真度低至 0.43%,这凸显了我们的电路在克服堆积限制方面的功效。
{"title":"Integrated Active Quenching Circuit for high-rate and distortionless SPAD-based time-resolved fluorescence applications.","authors":"Francesco Malanga, Gennaro Fratta, Giulia Acconcia, Ivan Rech","doi":"10.1109/TBCAS.2024.3445174","DOIUrl":"https://doi.org/10.1109/TBCAS.2024.3445174","url":null,"abstract":"<p><p>Time-Correlated Single Photon Counting (TCSPC) is a pivotal technique in low-light-detection applications, renowned for its exceptional sensitivity and bandwidth, widely used in Fluorescence Lifetime Imaging Microscopy (FLIM) and quantum optics. Despite its features, TCSPC is significantly hindered by the pile-up effect, which may distort measurements at high photon-detection rates. Overcoming pile-up is challenging, with traditional solutions often involving complex post-processing or multichannel systems, complicating the TCSPC setup and limiting performance. A breakthrough to overcome this issue is matching the photodetector dead time to an integer multiple of the laser period, obtaining a distortionless histogram even at high illumination conditions. Building on this concept, we present an Active Quenching Circuit (AQC) developed in high-voltage 150 nm technology, achieving unprecedented control over the Single Photon Avalanche Diode (SPAD) dead time. Our design compensates for Process, Voltage, and Temperature (PVT) variations, ensuring ultra precise and robust dead time tuning. The presented AQC achieves a dead-time resolution of 50 ps suitable for time-resolved experiments within a selectable range of laser frequencies from 20 to 100 MHz, maintaining close-to- ideal linearity in dead-time control. Experimental validations through fluorescence measurements reveal a distortion as low as 0.43% under elevated count-rate conditions, highlighting the efficacy of our circuit in overcoming the pile-up limitation.</p>","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RRAM-Based Spiking Neural Network with Target-Modulated Spike-Timing-Dependent Plasticity. 基于 RRAM 的尖峰神经网络,具有目标调制的尖峰计时可塑性。
Pub Date : 2024-08-19 DOI: 10.1109/TBCAS.2024.3446177
Kalkidan Deme Muleta, Bai-Sun Kong

The spiking neural network (SNN) training with spike timing-dependent plasticity (STDP) for image classification usually requires a lot of neurons to extract representative features and(or) needs an external classifier. Conventional bio-inspired learning methods do not cover all possible learning opportunities, resulting in limited performance. We propose a new bio-plausible learning rule, target-modulated STDP (TSTDP), for higher learning efficiency and accuracy. We also propose an SNN architecture trainable with TSTDP using temporally encoded spikes to obtain higher accuracy and improved area efficiency without using an external classifier. Using the MNIST dataset, we have shown that the proposed design achieves an accuracy of 92%, which is up to 7% improvement compared to conventional networks of similar sizes. For providing similar accuracy, up to 75% smaller network size has been shown on top of demonstrating stronger resilience to process variations. Benchmarking on the CIFAR-10 and neuromorphic DVS gesture datasets show an accuracy improvement of up to 12.4% and 3.6%, respectively.

利用尖峰时序可塑性(STDP)训练用于图像分类的尖峰神经网络(SNN)通常需要大量神经元来提取代表性特征,并且(或)需要外部分类器。传统的生物启发学习方法无法涵盖所有可能的学习机会,导致性能有限。为了提高学习效率和准确性,我们提出了一种新的生物仿真学习规则--目标调制 STDP(TSTDP)。我们还提出了一种可使用 TSTDP 进行训练的 SNN 架构,利用时间编码的尖峰来获得更高的准确率,并在不使用外部分类器的情况下提高面积效率。通过使用 MNIST 数据集,我们发现所提出的设计达到了 92% 的准确率,与类似规模的传统网络相比提高了 7%。在提供类似准确率的同时,网络规模也缩小了 75%,而且对流程变化的适应能力更强。在 CIFAR-10 和神经形态 DVS 手势数据集上进行的基准测试表明,准确率分别提高了 12.4% 和 3.6%。
{"title":"RRAM-Based Spiking Neural Network with Target-Modulated Spike-Timing-Dependent Plasticity.","authors":"Kalkidan Deme Muleta, Bai-Sun Kong","doi":"10.1109/TBCAS.2024.3446177","DOIUrl":"https://doi.org/10.1109/TBCAS.2024.3446177","url":null,"abstract":"<p><p>The spiking neural network (SNN) training with spike timing-dependent plasticity (STDP) for image classification usually requires a lot of neurons to extract representative features and(or) needs an external classifier. Conventional bio-inspired learning methods do not cover all possible learning opportunities, resulting in limited performance. We propose a new bio-plausible learning rule, target-modulated STDP (TSTDP), for higher learning efficiency and accuracy. We also propose an SNN architecture trainable with TSTDP using temporally encoded spikes to obtain higher accuracy and improved area efficiency without using an external classifier. Using the MNIST dataset, we have shown that the proposed design achieves an accuracy of 92%, which is up to 7% improvement compared to conventional networks of similar sizes. For providing similar accuracy, up to 75% smaller network size has been shown on top of demonstrating stronger resilience to process variations. Benchmarking on the CIFAR-10 and neuromorphic DVS gesture datasets show an accuracy improvement of up to 12.4% and 3.6%, respectively.</p>","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 40-nm 169mW Ultrasound Imaging Processor Supporting Advanced Modes for Hand-Held Devices. 支持手持设备高级模式的 40 纳米 169mW 超声波成像处理器。
Pub Date : 2024-08-19 DOI: 10.1109/TBCAS.2024.3445968
Yi-Lin Lo, Yu-Chen Lo, Chia-Hsiang Yang

Hand-held ultrasound devices have been widely used in the field of healthcare and power-efficient, real-time imaging is essential. This work presents the world's first ultrasound imaging processor supporting advanced modes, including vector flow imaging and elastography imaging. Plane-wave beamforming is utilized to ensure that the pulse repetition frequency (PRF) is sufficiently high for the advanced mode. The storage size and power consumption are minimized through algorithm-architecture co-optimization. The proposed plane-wave beamforming reduces the storage size of the required delay values by 43.7%. By exchanging the processing order, the storage size is reduced by 78.1% for elastography imaging. Parallel beamforming and interleaved firing are employed to achieve real-time imaging for all the supported modes. Fabricated in 40-nm CMOS technology, the proposed processor integrates 4.7M logic gates in core area of 3.24mm2. This work achieves a 20.3× higher beamforming rate with 5.3-to-29.1× lower power consumption than the state-of- the-art design. It also has 60% lower hardware complexity (in terms of gate count), in addition to the capability for supporting the advanced mode.

手持式超声设备已广泛应用于医疗保健领域,高能效的实时成像至关重要。这项研究提出了世界上第一款支持矢量流成像和弹性成像等高级模式的超声成像处理器。利用平面波波束成形技术确保脉冲重复频率(PRF)足够高,以满足高级模式的需要。通过算法和架构的共同优化,最大限度地减少了存储空间和功耗。所提出的平面波波束成形可将所需延迟值的存储大小减少 43.7%。通过交换处理顺序,弹性成像的存储空间减少了 78.1%。采用并行波束成形和交错发射技术可实现所有支持模式的实时成像。所提出的处理器采用 40 纳米 CMOS 技术制造,集成了 470 万个逻辑门,核心面积为 3.24 平方毫米。与最先进的设计相比,这项工作的波束成形率提高了 20.3 倍,功耗降低了 5.3 至 29.1 倍。除了支持高级模式的能力外,它还将硬件复杂性(门数)降低了 60%。
{"title":"A 40-nm 169mW Ultrasound Imaging Processor Supporting Advanced Modes for Hand-Held Devices.","authors":"Yi-Lin Lo, Yu-Chen Lo, Chia-Hsiang Yang","doi":"10.1109/TBCAS.2024.3445968","DOIUrl":"https://doi.org/10.1109/TBCAS.2024.3445968","url":null,"abstract":"<p><p>Hand-held ultrasound devices have been widely used in the field of healthcare and power-efficient, real-time imaging is essential. This work presents the world's first ultrasound imaging processor supporting advanced modes, including vector flow imaging and elastography imaging. Plane-wave beamforming is utilized to ensure that the pulse repetition frequency (PRF) is sufficiently high for the advanced mode. The storage size and power consumption are minimized through algorithm-architecture co-optimization. The proposed plane-wave beamforming reduces the storage size of the required delay values by 43.7%. By exchanging the processing order, the storage size is reduced by 78.1% for elastography imaging. Parallel beamforming and interleaved firing are employed to achieve real-time imaging for all the supported modes. Fabricated in 40-nm CMOS technology, the proposed processor integrates 4.7M logic gates in core area of 3.24mm<sup>2</sup>. This work achieves a 20.3× higher beamforming rate with 5.3-to-29.1× lower power consumption than the state-of- the-art design. It also has 60% lower hardware complexity (in terms of gate count), in addition to the capability for supporting the advanced mode.</p>","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RVDLAHA: An RISC-V DLA Hardware Architecture for On-Device Real-Time Seizure Detection and Personalization in Wearable Applications. RVDLAHA:用于可穿戴应用中设备上实时癫痫发作检测和个性化的 RISC-V DLA 硬件架构。
Pub Date : 2024-08-13 DOI: 10.1109/TBCAS.2024.3442250
Shuenn-Yuh Lee, Ming-Yueh Ku, Yen-Hsing Tsai, Chou-Ching Lin

Epilepsy is a globally distributed chronic neurological disorder that may pose a threat to life without warning. Therefore, the use of wearable devices for real-time detection and treatment of epilepsy is crucial. Additionally, personalizing disease detection algorithms for individual users is also a challenge in clinical applications. Some studies have proposed seizure detection algorithms with convolutional neural networks (CNNs) and programmable hardware architectures for speeding up the process of CNN inference. However, personalizing seizure detection algorithms could still not be performed on these hardware architectures. Consequently, this study proposes three key contributions to address the challenges: a real-time seizure detection and personalization algorithm, a programmable reduced instruction set computer-V (RISC-V) deep learning accelerator (DLA) hardware architecture (RVDLAHA), and a dedicated RISC-V DLA (RVDLA) compiler. In animal experiments with lab rats, the proposed CNN-based seizure detection algorithm obtains an accuracy of 99.5% for a 32-bit floating point and an accuracy of 99.3% for a 16-bit fixed point. Additionally, the proposed personalization algorithm increases the testing accuracy across different databases from 85.0% to 92.9%. The RVDLAHA is implemented on Xilinx PYNQ-Z2, with a power consumption of only 0.107 W at an operating frequency of 1 MHz. Each step, including raw data input, preprocessing, detection, and personalization, requires only 17.8, 1.0, 1.1, and 1.3 ms, respectively. With the hardware architecture, the seizure detection and personalization algorithm can provide on-device real-time monitoring.

癫痫是一种遍布全球的慢性神经系统疾病,可能会在毫无征兆的情况下对生命造成威胁。因此,使用可穿戴设备对癫痫进行实时检测和治疗至关重要。此外,针对个人用户的个性化疾病检测算法也是临床应用中的一项挑战。一些研究提出了利用卷积神经网络(CNN)和可编程硬件架构加速 CNN 推断过程的癫痫发作检测算法。然而,个性化癫痫发作检测算法仍无法在这些硬件架构上实现。因此,本研究提出了应对挑战的三大贡献:实时癫痫发作检测和个性化算法、可编程精简指令集计算机-V(RISC-V)深度学习加速器(DLA)硬件架构(RVDLAHA)和专用 RISC-V DLA(RVDLA)编译器。在以实验鼠为对象的动物实验中,所提出的基于 CNN 的癫痫发作检测算法在 32 位浮点时的准确率达到 99.5%,在 16 位定点时的准确率达到 99.3%。此外,所提出的个性化算法还将不同数据库的检测准确率从 85.0% 提高到 92.9%。RVDLAHA 在 Xilinx PYNQ-Z2 上实现,工作频率为 1 MHz 时功耗仅为 0.107 W。每个步骤,包括原始数据输入、预处理、检测和个性化,分别只需要 17.8、1.0、1.1 和 1.3 毫秒。利用该硬件架构,癫痫发作检测和个性化算法可提供设备上的实时监测。
{"title":"RVDLAHA: An RISC-V DLA Hardware Architecture for On-Device Real-Time Seizure Detection and Personalization in Wearable Applications.","authors":"Shuenn-Yuh Lee, Ming-Yueh Ku, Yen-Hsing Tsai, Chou-Ching Lin","doi":"10.1109/TBCAS.2024.3442250","DOIUrl":"https://doi.org/10.1109/TBCAS.2024.3442250","url":null,"abstract":"<p><p>Epilepsy is a globally distributed chronic neurological disorder that may pose a threat to life without warning. Therefore, the use of wearable devices for real-time detection and treatment of epilepsy is crucial. Additionally, personalizing disease detection algorithms for individual users is also a challenge in clinical applications. Some studies have proposed seizure detection algorithms with convolutional neural networks (CNNs) and programmable hardware architectures for speeding up the process of CNN inference. However, personalizing seizure detection algorithms could still not be performed on these hardware architectures. Consequently, this study proposes three key contributions to address the challenges: a real-time seizure detection and personalization algorithm, a programmable reduced instruction set computer-V (RISC-V) deep learning accelerator (DLA) hardware architecture (RVDLAHA), and a dedicated RISC-V DLA (RVDLA) compiler. In animal experiments with lab rats, the proposed CNN-based seizure detection algorithm obtains an accuracy of 99.5% for a 32-bit floating point and an accuracy of 99.3% for a 16-bit fixed point. Additionally, the proposed personalization algorithm increases the testing accuracy across different databases from 85.0% to 92.9%. The RVDLAHA is implemented on Xilinx PYNQ-Z2, with a power consumption of only 0.107 W at an operating frequency of 1 MHz. Each step, including raw data input, preprocessing, detection, and personalization, requires only 17.8, 1.0, 1.1, and 1.3 ms, respectively. With the hardware architecture, the seizure detection and personalization algorithm can provide on-device real-time monitoring.</p>","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141977492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 2m-Range 711μW Body Channel Communication Transceiver Featuring Dynamically-Sampling Bias-Free Interface Front End. 具有动态采样无偏置接口前端的 2m 范围 711μW 人体信道通信收发器。
Pub Date : 2024-08-07 DOI: 10.1109/TBCAS.2024.3439619
Guanjie Gu, Changgui Yang, Jian Zhao, Sijun Du, Yuxuan Luo, Bo Zhao

Body Channel Communication (BCC) utilizes the body surface as a low-loss signal transmission medium, reducing the power consumption of wireless wearable devices. However, the effective communication range on the human body is limited in the state-of-the-art BCC transceivers, where the signal loss between the body surface and the BCC receiver remains one of the main bottlenecks. To reduce the interface loss, a high input impedance is desired by the BCC receiver, but the DC-biasing circuits decrease the input impedance. In this work, a dynamically-sampling IFE is proposed to eliminate the DC voltage bias, resulting in a 90kΩ high input impedance and a 94dB RF-IF conversion gain to reduce the interface loss in long-range BCC applications. The BCC transceiver chip is fabricated in 55nm CMOS process, taking a die area of 0.123mm2. Measured results show that the chip extends the BCC range to 2m for both the forward and backward paths, where the transmitter and receiver consume 711μW power in total.

人体信道通信(BCC)利用人体表面作为低损耗信号传输介质,从而降低了无线可穿戴设备的功耗。然而,最先进的 BCC 收发器在人体上的有效通信范围有限,体表和 BCC 接收器之间的信号损耗仍然是主要瓶颈之一。为了减少接口损耗,BCC 接收器需要高输入阻抗,但直流偏压电路会降低输入阻抗。本研究提出了一种动态采样 IFE,以消除直流电压偏置,从而实现 90kΩ 的高输入阻抗和 94dB 的射频-IF 转换增益,以减少远距离 BCC 应用中的接口损耗。BCC 收发器芯片采用 55 纳米 CMOS 工艺制造,芯片面积为 0.123 平方毫米。测量结果表明,该芯片将前向和后向路径的 BCC 范围扩大到 2 米,其中发射器和接收器的总功耗为 711μW。
{"title":"A 2m-Range 711μW Body Channel Communication Transceiver Featuring Dynamically-Sampling Bias-Free Interface Front End.","authors":"Guanjie Gu, Changgui Yang, Jian Zhao, Sijun Du, Yuxuan Luo, Bo Zhao","doi":"10.1109/TBCAS.2024.3439619","DOIUrl":"https://doi.org/10.1109/TBCAS.2024.3439619","url":null,"abstract":"<p><p>Body Channel Communication (BCC) utilizes the body surface as a low-loss signal transmission medium, reducing the power consumption of wireless wearable devices. However, the effective communication range on the human body is limited in the state-of-the-art BCC transceivers, where the signal loss between the body surface and the BCC receiver remains one of the main bottlenecks. To reduce the interface loss, a high input impedance is desired by the BCC receiver, but the DC-biasing circuits decrease the input impedance. In this work, a dynamically-sampling IFE is proposed to eliminate the DC voltage bias, resulting in a 90kΩ high input impedance and a 94dB RF-IF conversion gain to reduce the interface loss in long-range BCC applications. The BCC transceiver chip is fabricated in 55nm CMOS process, taking a die area of 0.123mm<sup>2</sup>. Measured results show that the chip extends the BCC range to 2m for both the forward and backward paths, where the transmitter and receiver consume 711μW power in total.</p>","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141903948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Performance Method and Architecture for Attention Computation in DNN Inference. DNN 推断中注意力计算的高性能方法和架构
Pub Date : 2024-08-01 DOI: 10.1109/TBCAS.2024.3436837
Qi Cheng, Xiaofang Hu, He Xiao, Yue Zhou, Shukai Duan

In recent years, The combination of Attention mechanism and deep learning has a wide range of applications in the field of medical imaging. However, due to its complex computational processes, existing hardware architectures have high resource consumption or low accuracy, and deploying them efficiently to DNN accelerators is a challenge. This paper proposes an online-programmable Attention hardware architecture based on compute-in-memory (CIM) marco, which reduces the complexity of Attention in hardware and improves integration density, energy efficiency, and calculation accuracy. First, the Attention computation process is decomposed into multiple cascaded combinatorial matrix operations to reduce the complexity of its implementation on the hardware side; second, in order to reduce the influence of the non-ideal characteristics of the hardware, an online-programmable CIM architecture is designed to improve calculation accuracy by dynamically adjusting the weights; and lastly, it is verified that the proposed Attention hardware architecture can be applied for the inference of deep neural networks through Spice simulation. Based on the 100nm CMOS process, compared with the traditional Attention hardware architectures, the integrated density and energy efficiency are increased by at least 91.38 times, and latency and computing efficiency are improved by about 12.5 times.

近年来,注意力机制与深度学习的结合在医学影像领域有着广泛的应用。然而,由于其计算过程复杂,现有的硬件架构存在资源消耗大或精度低的问题,如何将其高效地部署到 DNN 加速器上是一个难题。本文提出了一种基于内存计算(CIM)marco 的在线可编程 Attention 硬件架构,降低了 Attention 在硬件上的复杂度,提高了集成密度、能效和计算精度。首先,将Attention计算过程分解为多个级联组合矩阵运算,以降低其在硬件端的实现复杂度;其次,为了降低硬件非理想特性的影响,设计了一种在线可编程CIM架构,通过动态调整权重来提高计算精度;最后,通过Spice仿真验证了所提出的Attention硬件架构可应用于深度神经网络推理。基于 100nm CMOS 工艺,与传统的 Attention 硬件架构相比,集成密度和能效至少提高了 91.38 倍,延迟和计算效率提高了约 12.5 倍。
{"title":"High-Performance Method and Architecture for Attention Computation in DNN Inference.","authors":"Qi Cheng, Xiaofang Hu, He Xiao, Yue Zhou, Shukai Duan","doi":"10.1109/TBCAS.2024.3436837","DOIUrl":"10.1109/TBCAS.2024.3436837","url":null,"abstract":"<p><p>In recent years, The combination of Attention mechanism and deep learning has a wide range of applications in the field of medical imaging. However, due to its complex computational processes, existing hardware architectures have high resource consumption or low accuracy, and deploying them efficiently to DNN accelerators is a challenge. This paper proposes an online-programmable Attention hardware architecture based on compute-in-memory (CIM) marco, which reduces the complexity of Attention in hardware and improves integration density, energy efficiency, and calculation accuracy. First, the Attention computation process is decomposed into multiple cascaded combinatorial matrix operations to reduce the complexity of its implementation on the hardware side; second, in order to reduce the influence of the non-ideal characteristics of the hardware, an online-programmable CIM architecture is designed to improve calculation accuracy by dynamically adjusting the weights; and lastly, it is verified that the proposed Attention hardware architecture can be applied for the inference of deep neural networks through Spice simulation. Based on the 100nm CMOS process, compared with the traditional Attention hardware architectures, the integrated density and energy efficiency are increased by at least 91.38 times, and latency and computing efficiency are improved by about 12.5 times.</p>","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141876988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI Accelerator with Ultralightweight Time-Period CNN-Based Model for Arrhythmia Classification. 基于超轻时间周期 CNN 模型的人工智能加速器,用于心律失常分类。
Pub Date : 2024-07-30 DOI: 10.1109/TBCAS.2024.3435718
Shuenn-Yuh Lee, Ming-Yueh Ku, Wei-Cheng Tseng, Ju-Yi Chen

This work proposes a classification system for arrhythmias, aiming to enhance the efficiency of the diagnostic process for cardiologists. The proposed algorithm includes a naive preprocessing procedure for electrocardiography (ECG) data applicable to various ECG databases. Additionally, this work proposes an ultralightweight model for arrhythmia classification based on a convolutional neural network and incorporating R-peak interval features to represent long-term rhythm information, thereby improving the model's classification performance. The proposed model is trained and tested by using the MIT-BIH and NCKU-CBIC databases in accordance with the classification standards of the Association for the Advancement of Medical Instrumentation (AAMI), achieving high accuracies of 98.32% and 97.1%. This work applies the arrhythmia classification algorithm to a web-based system, thus providing a graphical interface. The cloud-based execution of automated artificial intelligence (AI) classification allows cardiologists and patients to view ECG wave conditions instantly, thereby remarkably enhancing the quality of medical examination. This work also designs a customized integrated circuit for the hardware implementation of an AI accelerator. The accelerator utilizes a parallelized processing element array architecture to perform convolution and fully connected layer operations. It introduces proposed hybrid stationary techniques, combining input and weight stationary modes to increase data reuse drastically and reduce hardware execution cycles and power consumption, ultimately achieving high-performance computing. This accelerator is implemented in the form of a chip by using the TSMC 180 nm CMOS process. It exhibits a power consumption of 122 μW, a classification latency of 6.8 ms, and an energy efficiency of 0.83 μJ/classification.

这项研究提出了一种心律失常分类系统,旨在提高心脏病专家诊断过程的效率。提出的算法包括一个适用于各种心电图数据库的心电图(ECG)数据天真预处理程序。此外,这项研究还提出了一种基于卷积神经网络的超轻量级心律失常分类模型,并结合 R 峰间期特征来表示长期节律信息,从而提高了模型的分类性能。根据美国医学仪器促进协会(AAMI)的分类标准,使用 MIT-BIH 和 NCKU-CBIC 数据库对提出的模型进行了训练和测试,取得了 98.32% 和 97.1% 的高准确率。这项工作将心律失常分类算法应用于基于网络的系统,从而提供了一个图形界面。基于云计算的人工智能(AI)自动分类执行可让心脏病专家和患者即时查看心电图波形情况,从而显著提高医疗检查质量。这项工作还为人工智能加速器的硬件实施设计了定制集成电路。该加速器利用并行化处理元件阵列架构来执行卷积和全连接层操作。它引入了拟议的混合静态技术,将输入和权重静态模式相结合,大幅提高了数据重用率,减少了硬件执行周期和功耗,最终实现了高性能计算。该加速器采用台积电 180 纳米 CMOS 工艺,以芯片形式实现。它的功耗为 122 μW,分类延迟为 6.8 ms,能效为 0.83 μJ/分类。
{"title":"AI Accelerator with Ultralightweight Time-Period CNN-Based Model for Arrhythmia Classification.","authors":"Shuenn-Yuh Lee, Ming-Yueh Ku, Wei-Cheng Tseng, Ju-Yi Chen","doi":"10.1109/TBCAS.2024.3435718","DOIUrl":"https://doi.org/10.1109/TBCAS.2024.3435718","url":null,"abstract":"<p><p>This work proposes a classification system for arrhythmias, aiming to enhance the efficiency of the diagnostic process for cardiologists. The proposed algorithm includes a naive preprocessing procedure for electrocardiography (ECG) data applicable to various ECG databases. Additionally, this work proposes an ultralightweight model for arrhythmia classification based on a convolutional neural network and incorporating R-peak interval features to represent long-term rhythm information, thereby improving the model's classification performance. The proposed model is trained and tested by using the MIT-BIH and NCKU-CBIC databases in accordance with the classification standards of the Association for the Advancement of Medical Instrumentation (AAMI), achieving high accuracies of 98.32% and 97.1%. This work applies the arrhythmia classification algorithm to a web-based system, thus providing a graphical interface. The cloud-based execution of automated artificial intelligence (AI) classification allows cardiologists and patients to view ECG wave conditions instantly, thereby remarkably enhancing the quality of medical examination. This work also designs a customized integrated circuit for the hardware implementation of an AI accelerator. The accelerator utilizes a parallelized processing element array architecture to perform convolution and fully connected layer operations. It introduces proposed hybrid stationary techniques, combining input and weight stationary modes to increase data reuse drastically and reduce hardware execution cycles and power consumption, ultimately achieving high-performance computing. This accelerator is implemented in the form of a chip by using the TSMC 180 nm CMOS process. It exhibits a power consumption of 122 μW, a classification latency of 6.8 ms, and an energy efficiency of 0.83 μJ/classification.</p>","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141857446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on biomedical circuits and systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1