首页 > 最新文献

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)最新文献

英文 中文
SNNOpt: An Application-Specific Design Framework for Spiking Neural Networks SNNOpt:脉冲神经网络的特定应用设计框架
Jingyu He, Ziyang Shen, Fengshi Tian, Jinbo Chen, Jie Yang, M. Sawan, Hsiang-Ting Chen, P. Bogdan, C. Tsui
We propose a systematic application-specific hardware design methodology for designing Spiking Neural Network (SNN), SNNOpt, which consists of three novel phases: 1) an Olliver-Ricci-Curvature (ORC)-based architecture-aware network partitioning, 2) a reinforcement learning mapping strategy, and 3) a Bayesian optimization algorithm for NoC design space exploration. Experimental results show that SNNOpt achieves a 47.45% less runtime and 58.64% energy savings over state-of-the-art approaches.
我们提出了一种系统的应用专用硬件设计方法,用于设计峰值神经网络(SNNOpt), SNNOpt由三个新阶段组成:1)基于ololiver - ricci -曲率(ORC)的架构感知网络划分,2)强化学习映射策略,以及3)用于NoC设计空间探索的贝叶斯优化算法。实验结果表明,SNNOpt比最先进的方法节省了47.45%的运行时间和58.64%的能源。
{"title":"SNNOpt: An Application-Specific Design Framework for Spiking Neural Networks","authors":"Jingyu He, Ziyang Shen, Fengshi Tian, Jinbo Chen, Jie Yang, M. Sawan, Hsiang-Ting Chen, P. Bogdan, C. Tsui","doi":"10.1109/AICAS57966.2023.10168605","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168605","url":null,"abstract":"We propose a systematic application-specific hardware design methodology for designing Spiking Neural Network (SNN), SNNOpt, which consists of three novel phases: 1) an Olliver-Ricci-Curvature (ORC)-based architecture-aware network partitioning, 2) a reinforcement learning mapping strategy, and 3) a Bayesian optimization algorithm for NoC design space exploration. Experimental results show that SNNOpt achieves a 47.45% less runtime and 58.64% energy savings over state-of-the-art approaches.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132385149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Interpretable Pixel Intensity Reconstruction Model for Asynchronous Event Camera 异步事件相机的可解释像素强度重建模型
Hongwei Shan, Lichen Feng, Yueqi Zhang, Zhangming Zhu
Event cameras with high temporal resolution and high dynamic range have great potential in computer vision (CV) tasks. To utilize the deep neural networks directly, an efficient reconstruction method converting event-based data to frame-based is necessary. In this work, the interpretable Event Represented Intensity (ERI) model that recovers the logarithm of the intensity sensed by a dynamic vision pixel is proposed for the first time. The amplitude-frequency characteristic of the recovered logarithm of the intensity is used to construct the frame-based image to complete CV tasks. Experiment results on the N-Caltech101 dataset show that the proposed ERI model achieves the classification accuracy of 79.20%, which balances the performance and computation cost better.
高时间分辨率和高动态范围的事件相机在计算机视觉任务中具有很大的应用潜力。为了直接利用深度神经网络,需要一种有效的将基于事件的数据转换为基于帧的数据的重构方法。在这项工作中,首次提出了可解释的事件表示强度(ERI)模型,该模型可以恢复由动态视觉像素感知的强度的对数。利用强度恢复对数的幅频特性构造基于帧的图像来完成CV任务。在N-Caltech101数据集上的实验结果表明,本文提出的ERI模型达到了79.20%的分类准确率,较好地平衡了性能和计算成本。
{"title":"An Interpretable Pixel Intensity Reconstruction Model for Asynchronous Event Camera","authors":"Hongwei Shan, Lichen Feng, Yueqi Zhang, Zhangming Zhu","doi":"10.1109/AICAS57966.2023.10168635","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168635","url":null,"abstract":"Event cameras with high temporal resolution and high dynamic range have great potential in computer vision (CV) tasks. To utilize the deep neural networks directly, an efficient reconstruction method converting event-based data to frame-based is necessary. In this work, the interpretable Event Represented Intensity (ERI) model that recovers the logarithm of the intensity sensed by a dynamic vision pixel is proposed for the first time. The amplitude-frequency characteristic of the recovered logarithm of the intensity is used to construct the frame-based image to complete CV tasks. Experiment results on the N-Caltech101 dataset show that the proposed ERI model achieves the classification accuracy of 79.20%, which balances the performance and computation cost better.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133481011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal Similarity-Based Computation Reduction for Video Transformers in Edge Camera Nodes 基于时间相似度的边缘摄像机节点视频变换计算缩减
Udari De Alwis, Zhongheng Xie, Massimo Alioto
Recognizing human actions in video sequences has become an essential task in video surveillance applications. In such applications, transformer models have rapidly gained wide interest thanks to their performance. However, their advantages come at the cost of a high computational and memory cost, especially when they need to be incorporated in edge devices. In this work, temporal similarity tunnel insertion is utilized to reduce the overall computation burden in video transformer networks in action recognition tasks. Furthermore, an edge-friendly video transformer model is proposed based on temporal similarity, which substantially reduces the computation cost. Its smaller variant EMViT achieves 38% computation reduction under the UCF101 dataset, while keeping the accuracy degradation insignificant (<0.02%). Also, the larger variant CMViT reduces computation by 14% (13%) with an accuracy degradation of 2% (3%) in scaled Kinetic400 and Jester datasets.
在视频序列中识别人的行为已经成为视频监控应用中的一项重要任务。在这些应用中,变压器模型由于其性能而迅速获得了广泛的兴趣。然而,它们的优势是以高计算和内存成本为代价的,特别是当它们需要集成到边缘设备中时。在这项工作中,利用时间相似隧道插入来减少视频变压器网络在动作识别任务中的总体计算负担。在此基础上,提出了一种基于时间相似度的边缘友好型视频变压器模型,大大降低了计算量。其较小的变体EMViT在UCF101数据集下可以减少38%的计算量,同时保持精度下降不显著(<0.02%)。此外,在缩放的Kinetic400和Jester数据集中,更大的变体CMViT减少了14%(13%)的计算,精度降低了2%(3%)。
{"title":"Temporal Similarity-Based Computation Reduction for Video Transformers in Edge Camera Nodes","authors":"Udari De Alwis, Zhongheng Xie, Massimo Alioto","doi":"10.1109/AICAS57966.2023.10168610","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168610","url":null,"abstract":"Recognizing human actions in video sequences has become an essential task in video surveillance applications. In such applications, transformer models have rapidly gained wide interest thanks to their performance. However, their advantages come at the cost of a high computational and memory cost, especially when they need to be incorporated in edge devices. In this work, temporal similarity tunnel insertion is utilized to reduce the overall computation burden in video transformer networks in action recognition tasks. Furthermore, an edge-friendly video transformer model is proposed based on temporal similarity, which substantially reduces the computation cost. Its smaller variant EMViT achieves 38% computation reduction under the UCF101 dataset, while keeping the accuracy degradation insignificant (<0.02%). Also, the larger variant CMViT reduces computation by 14% (13%) with an accuracy degradation of 2% (3%) in scaled Kinetic400 and Jester datasets.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132076976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LungHeart-AtMe: Adventitious Cardiopulmonary Sounds Classification Using MMoE with STFT and MFCCs Spectrograms LungHeart-AtMe:使用MMoE与STFT和MFCCs频谱进行非定式心肺音分类
Changyan Chen†, Qing Zhang, Shirui Sheng, Huajie Huang, Yuhang Zhang, Yongfu Li
Adventitious cardiopulmonary (lung and heart) sound detection and classification through a digital stethoscope plays a vital role in early diagnosis and telehealth services. However, automatically detecting the adventitious sounds is challenging since they are easily susceptible to each other’s influence and noises. In this paper, for the first time, we simultaneously classify adventitious lung and heart sounds using our proposed LungHeart-AtMe model based on a mixed dataset of the ICBHI 2017 lung sounds dataset and the PhysioNet 2016 heart sounds dataset. Based on characteristics of lung and heart sounds, Wavelet Decomposition is applied first to perform noise reduction, then two time-frequency feature extraction techniques, which are Short Time Fourier Transform (STFT) and Mel Frequency Cepstral Coefficients (MFCCs), are chosen to extract preliminary features of sounds and transform sounds data to spectrograms that are easy to analyze. Our LungHeart-AtMe model is improved by introducing MMoE structure and by using the attention mechanism-based CNN model to extend its global feature extraction capability. From our experimental result, LungHeart-AtMe has achieved a Sensitivity of 71.55% and a Specificity of 28.06% for cardiopulmonary sounds classification.
通过数字听诊器检测和分类非定音在早期诊断和远程保健服务中起着至关重要的作用。然而,自动检测外来声音是一项挑战,因为它们很容易受到彼此的影响和噪音的影响。在本文中,我们首次基于ICBHI 2017年肺音数据集和PhysioNet 2016年心音数据集的混合数据集,使用我们提出的LungHeart-AtMe模型同时对非定式肺音和心音进行分类。根据肺音和心音的特点,首先采用小波分解进行降噪,然后选择短时傅立叶变换(STFT)和低频倒谱系数(MFCCs)两种时频特征提取技术提取声音的初步特征,并将声音数据转换成易于分析的频谱图。我们的LungHeart-AtMe模型通过引入MMoE结构进行改进,并使用基于注意机制的CNN模型扩展其全局特征提取能力。从实验结果来看,LungHeart-AtMe对心肺音分类的灵敏度为71.55%,特异性为28.06%。
{"title":"LungHeart-AtMe: Adventitious Cardiopulmonary Sounds Classification Using MMoE with STFT and MFCCs Spectrograms","authors":"Changyan Chen†, Qing Zhang, Shirui Sheng, Huajie Huang, Yuhang Zhang, Yongfu Li","doi":"10.1109/AICAS57966.2023.10168624","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168624","url":null,"abstract":"Adventitious cardiopulmonary (lung and heart) sound detection and classification through a digital stethoscope plays a vital role in early diagnosis and telehealth services. However, automatically detecting the adventitious sounds is challenging since they are easily susceptible to each other’s influence and noises. In this paper, for the first time, we simultaneously classify adventitious lung and heart sounds using our proposed LungHeart-AtMe model based on a mixed dataset of the ICBHI 2017 lung sounds dataset and the PhysioNet 2016 heart sounds dataset. Based on characteristics of lung and heart sounds, Wavelet Decomposition is applied first to perform noise reduction, then two time-frequency feature extraction techniques, which are Short Time Fourier Transform (STFT) and Mel Frequency Cepstral Coefficients (MFCCs), are chosen to extract preliminary features of sounds and transform sounds data to spectrograms that are easy to analyze. Our LungHeart-AtMe model is improved by introducing MMoE structure and by using the attention mechanism-based CNN model to extend its global feature extraction capability. From our experimental result, LungHeart-AtMe has achieved a Sensitivity of 71.55% and a Specificity of 28.06% for cardiopulmonary sounds classification.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129622406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
RC-GNN: Fast and Accurate Signoff Wire Delay Estimation with Customized Graph Neural Networks RC-GNN:基于自定义图神经网络的快速准确的信号线延迟估计
Linyu Zhu, Yue Gu, Xinfei Guo
As interconnect delay becomes more dominate in a timing path compared to the gate delay, accurate yet fast estimation of wire delay during the signoff stage is required. Prior machine learning-based wire delay estimation approaches either relied on tedious feature extraction processes or failed to capture the net topology information, incurring long turn around time. In this paper, we propose to leverage the power of graph neural networks (GNN) to estimate the interconnect delays during signoff. Different from other GNN-assisted timing analysis methods that were usually applied to a netlist, we harness the global message passing graph representation learning on RC graph directly to perform ultra-fast net delay estimation without requiring extra features. Furthermore, pre-processed graph features can be added to boost the estimation accuracy with slight run time penalty. Our proposed customized GNN models have been evaluated with the industrial design and compared against state of the art ML-based wire delay estimator. It shows that the proposed model outperforms the state-of-the-art ML-based signoff wire delay estimator by 4x in terms of run time while achieving similar accuracy levels.
与门延迟相比,互连延迟在时序路径中变得更加重要,因此需要在信号接收阶段准确而快速地估计线延迟。先前基于机器学习的线延迟估计方法要么依赖于繁琐的特征提取过程,要么无法捕获网络拓扑信息,从而导致较长的周转时间。在本文中,我们提出利用图神经网络(GNN)的能力来估计签名期间的互连延迟。与其他通常应用于网络列表的gnn辅助时序分析方法不同,我们直接利用RC图上的全局消息传递图表示学习来进行超快速的网络延迟估计,而不需要额外的特征。此外,可以添加预处理的图形特征来提高估计精度,同时减少运行时间损失。我们提出的定制GNN模型已经用工业设计进行了评估,并与最先进的基于ml的线延迟估计器进行了比较。它表明,所提出的模型在运行时间方面优于最先进的基于ml的签名线延迟估计器4倍,同时达到相似的精度水平。
{"title":"RC-GNN: Fast and Accurate Signoff Wire Delay Estimation with Customized Graph Neural Networks","authors":"Linyu Zhu, Yue Gu, Xinfei Guo","doi":"10.1109/AICAS57966.2023.10168562","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168562","url":null,"abstract":"As interconnect delay becomes more dominate in a timing path compared to the gate delay, accurate yet fast estimation of wire delay during the signoff stage is required. Prior machine learning-based wire delay estimation approaches either relied on tedious feature extraction processes or failed to capture the net topology information, incurring long turn around time. In this paper, we propose to leverage the power of graph neural networks (GNN) to estimate the interconnect delays during signoff. Different from other GNN-assisted timing analysis methods that were usually applied to a netlist, we harness the global message passing graph representation learning on RC graph directly to perform ultra-fast net delay estimation without requiring extra features. Furthermore, pre-processed graph features can be added to boost the estimation accuracy with slight run time penalty. Our proposed customized GNN models have been evaluated with the industrial design and compared against state of the art ML-based wire delay estimator. It shows that the proposed model outperforms the state-of-the-art ML-based signoff wire delay estimator by 4x in terms of run time while achieving similar accuracy levels.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125176141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Column-Parallel Time-Interleaved SAR/SS ADC for Computing in Memory with 2-8bit Reconfigurable Resolution 一种用于2-8位可重构分辨率内存计算的列并行时间交错SAR/SS ADC
Yuandong Li, Li Du, Yuan Du
Computing in Memory (CiM), as a computing system with non-von Neumann architecture, has been reported as one of the most promising neural network accelerators in the future. Compared with digital-based computation, CiM uses RAM arrays to calculate and store in the analog domain, avoiding the high delay and energy consumption caused by data transfer. However, the computational results require data converters for quantization, which often limits the development of high-performance CiMs. In this work, we propose a 2-8bit reconfigurable time-interleaved hybrid ADC architecture for high-speed CiMs, including successive approximation and single-slope stages. Reconfigurability introduces a trade-off between resolution and conversion speed for ADCs in different computing scenarios. A prototype was implemented in a 55 nm CMOS technology, which occupies an area of 330μm × 13μm and consumes a power of 1.429mW at 8-bit conversion mode. With a Nyquist frequency input sampled at 350 MS/s, the SNDR and SFDR are 40.93 dB and 51.08 dB, respectively. The resultant Walden figure of merit is 44.8 fJ/conv.
内存计算(CiM)作为一种非冯·诺依曼结构的计算系统,已被报道为未来最有前途的神经网络加速器之一。与基于数字的计算相比,CiM使用RAM阵列在模拟域中进行计算和存储,避免了数据传输带来的高延迟和能耗。然而,计算结果需要数据转换器进行量化,这往往限制了高性能cim的发展。在这项工作中,我们提出了一个用于高速cim的2-8位可重构时间交错混合ADC架构,包括连续逼近和单斜率级。可重构性为adc在不同的计算场景中引入了分辨率和转换速度之间的权衡。原型机采用55 nm CMOS工艺实现,其面积为330μm × 13μm, 8位转换模式功耗为1.429mW。以350 MS/s采样频率输入奈奎斯特频率时,SNDR和SFDR分别为40.93 dB和51.08 dB。由此得出的瓦尔登功绩系数为44.8 fJ/conv。
{"title":"A Column-Parallel Time-Interleaved SAR/SS ADC for Computing in Memory with 2-8bit Reconfigurable Resolution","authors":"Yuandong Li, Li Du, Yuan Du","doi":"10.1109/AICAS57966.2023.10168604","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168604","url":null,"abstract":"Computing in Memory (CiM), as a computing system with non-von Neumann architecture, has been reported as one of the most promising neural network accelerators in the future. Compared with digital-based computation, CiM uses RAM arrays to calculate and store in the analog domain, avoiding the high delay and energy consumption caused by data transfer. However, the computational results require data converters for quantization, which often limits the development of high-performance CiMs. In this work, we propose a 2-8bit reconfigurable time-interleaved hybrid ADC architecture for high-speed CiMs, including successive approximation and single-slope stages. Reconfigurability introduces a trade-off between resolution and conversion speed for ADCs in different computing scenarios. A prototype was implemented in a 55 nm CMOS technology, which occupies an area of 330μm × 13μm and consumes a power of 1.429mW at 8-bit conversion mode. With a Nyquist frequency input sampled at 350 MS/s, the SNDR and SFDR are 40.93 dB and 51.08 dB, respectively. The resultant Walden figure of merit is 44.8 fJ/conv.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123237127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In-memory Activation Compression for GPT Training GPT训练的内存激活压缩
Seungyong Lee, Geonu Yun, Hyuk-Jae Lee
Recently, a large number of parameters in Transformer-based language models have caused memory short-ages during training. Although solutions such as mixed precision and model parallelism have been proposed, they have the limitation of inducing communication overhead and requiring modification of the model by a programmer. To address this issue, we propose a scheme that compresses activation data in memory, enabling the reduction of memory usage during training in a user-transparent manner. The compression algorithm gathers activation data into a block and compresses it, using base-delta compression for the exponent and bit-plane zero compression for the sign and mantissa. Then, the important bits are arranged in order, and LSB truncation is applied to fit the target size. The proposed compression algorithm achieves a compression ratio of 2.09 for the sign, 2.04 for the exponent, and 1.21 for the mantissa. A compression ratio of 3.2 is obtained by applying up to the truncation, and we confirm the convergence of GPT-2 training with the compression.
近年来,基于transformer的语言模型中大量的参数导致了训练过程中的记忆不足。尽管已经提出了混合精度和模型并行等解决方案,但它们存在导致通信开销和需要程序员修改模型的局限性。为了解决这个问题,我们提出了一个在内存中压缩激活数据的方案,以用户透明的方式减少训练期间的内存使用。压缩算法将激活数据收集到一个块中并对其进行压缩,对指数使用基增量压缩,对符号和尾数使用位平面零压缩。然后,按顺序排列重要位,并采用LSB截断来拟合目标大小。提出的压缩算法实现了符号的压缩比为2.09,指数的压缩比为2.04,尾数的压缩比为1.21。通过对截断的up应用,得到了3.2的压缩比,并证实了GPT-2训练与压缩的收敛性。
{"title":"In-memory Activation Compression for GPT Training","authors":"Seungyong Lee, Geonu Yun, Hyuk-Jae Lee","doi":"10.1109/AICAS57966.2023.10168658","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168658","url":null,"abstract":"Recently, a large number of parameters in Transformer-based language models have caused memory short-ages during training. Although solutions such as mixed precision and model parallelism have been proposed, they have the limitation of inducing communication overhead and requiring modification of the model by a programmer. To address this issue, we propose a scheme that compresses activation data in memory, enabling the reduction of memory usage during training in a user-transparent manner. The compression algorithm gathers activation data into a block and compresses it, using base-delta compression for the exponent and bit-plane zero compression for the sign and mantissa. Then, the important bits are arranged in order, and LSB truncation is applied to fit the target size. The proposed compression algorithm achieves a compression ratio of 2.09 for the sign, 2.04 for the exponent, and 1.21 for the mantissa. A compression ratio of 3.2 is obtained by applying up to the truncation, and we confirm the convergence of GPT-2 training with the compression.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"509 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115892716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-Power Convolutional Neural Network Accelerator on FPGA 基于FPGA的低功耗卷积神经网络加速器
Kasem Khalil, Ashok Kumar V, M. Bayoumi
Convolutional Neural Network (CNN) accelerator is highly beneficial for mobile and resource-constrained devices. One of the research challenges is to design a power-economic accelerator. This paper proposes a CNN accelerator with low power consumption and acceptable performance. The proposed method uses pipelining between the used kernels for the convolution process and a shared multiplication and accumulation block. The available kernels work consequently while each one performs a different operation in sequence. The proposed method utilizes a series of operations between the kernels and memory weights to speed up the convolution process. The proposed accelerator is implemented using VHDL and FPGA Altera Arria 10 GX. The results show that the proposed method achieves 26.37 GOPS/W of energy consumption, which is lower than the existing method, with acceptable resource usage and performance. The proposed method is ideally suited for small and constrained devices.
卷积神经网络(CNN)加速器对移动设备和资源受限设备非常有利。研究的挑战之一是设计一个功率经济加速器。本文提出了一种低功耗、性能可接受的CNN加速器。该方法在卷积过程中使用核之间的流水线和共享乘法和累积块。当每个内核按顺序执行不同的操作时,可用的内核就会工作。该方法利用核和内存权值之间的一系列操作来加快卷积过程。该加速器采用VHDL和FPGA Altera Arria 10gx实现。结果表明,所提方法能耗达到26.37 GOPS/W,比现有方法低,且具有可接受的资源利用率和性能。所提出的方法非常适合小型和受限的设备。
{"title":"Low-Power Convolutional Neural Network Accelerator on FPGA","authors":"Kasem Khalil, Ashok Kumar V, M. Bayoumi","doi":"10.1109/AICAS57966.2023.10168646","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168646","url":null,"abstract":"Convolutional Neural Network (CNN) accelerator is highly beneficial for mobile and resource-constrained devices. One of the research challenges is to design a power-economic accelerator. This paper proposes a CNN accelerator with low power consumption and acceptable performance. The proposed method uses pipelining between the used kernels for the convolution process and a shared multiplication and accumulation block. The available kernels work consequently while each one performs a different operation in sequence. The proposed method utilizes a series of operations between the kernels and memory weights to speed up the convolution process. The proposed accelerator is implemented using VHDL and FPGA Altera Arria 10 GX. The results show that the proposed method achieves 26.37 GOPS/W of energy consumption, which is lower than the existing method, with acceptable resource usage and performance. The proposed method is ideally suited for small and constrained devices.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123555096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Lightweight Convolutional Neural Network for Atrial Fibrillation Detection Using Dual-Channel Binary Features from Single-Lead Short ECG 基于单导联短心电图双通道二值特征的轻型卷积神经网络心房颤动检测
Jiahao Liu, Xinyu Liu, Liang Zhou, L. Chang, Jun Zhou
Atrial fibrillation (AF) is a prevalent cardiovascular disease in the elderly, significantly increasing the risk of stroke and heart failure, etc. While the artificial neural network (ANN) has recently demonstrated high accuracy in ECG-based AF detection, its high computation complexity makes it challenging for real-time and long-term monitoring on low-power wearable devices, which is critical for detecting paroxysmal AF. Therefore, in this work, a lightweight convolutional neural network for AF detection is proposed using a dual-channel binary features extraction technique from single-lead short ECG to achieve both high classification accuracy and low computation complexity, and evaluated on the 2017 PhysioNet/CinC Challenge dataset, the proposed method achieves 93.6% sensitivity and 0.81 F1 score for AF detection. Moreover, this design consumes only 1.83M parameters, achieving up to 27x reductions compared with prior works, and only needs 57M MACs for calculation. As a result, it is suitable for deployment in low-power wearable devices for long-term AF monitoring.
心房颤动(AF)是老年人常见的心血管疾病,显著增加中风、心力衰竭等风险。虽然人工神经网络(ANN)最近在基于ecg的AF检测中表现出了很高的准确性,但其高计算复杂性使得在低功耗可穿戴设备上进行实时和长期监测具有挑战性,这对于检测阵发性AF至关重要。因此,在本工作中,采用单导联短心电双通道二值特征提取技术,提出了一种用于AF检测的轻量级卷积神经网络,实现了高分类精度和低计算复杂度,并在2017年PhysioNet/CinC Challenge数据集上进行了评估,该方法对AF检测的灵敏度为93.6%,F1评分为0.81。此外,本设计仅消耗183 m个参数,与之前的作品相比减少了27倍,仅需要57M个mac进行计算。因此,它适合部署在低功耗可穿戴设备中进行长期AF监测。
{"title":"A Lightweight Convolutional Neural Network for Atrial Fibrillation Detection Using Dual-Channel Binary Features from Single-Lead Short ECG","authors":"Jiahao Liu, Xinyu Liu, Liang Zhou, L. Chang, Jun Zhou","doi":"10.1109/AICAS57966.2023.10168645","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168645","url":null,"abstract":"Atrial fibrillation (AF) is a prevalent cardiovascular disease in the elderly, significantly increasing the risk of stroke and heart failure, etc. While the artificial neural network (ANN) has recently demonstrated high accuracy in ECG-based AF detection, its high computation complexity makes it challenging for real-time and long-term monitoring on low-power wearable devices, which is critical for detecting paroxysmal AF. Therefore, in this work, a lightweight convolutional neural network for AF detection is proposed using a dual-channel binary features extraction technique from single-lead short ECG to achieve both high classification accuracy and low computation complexity, and evaluated on the 2017 PhysioNet/CinC Challenge dataset, the proposed method achieves 93.6% sensitivity and 0.81 F1 score for AF detection. Moreover, this design consumes only 1.83M parameters, achieving up to 27x reductions compared with prior works, and only needs 57M MACs for calculation. As a result, it is suitable for deployment in low-power wearable devices for long-term AF monitoring.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121807942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Fully Differential 4-Bit Analog Compute-In-Memory Architecture for Inference Application 用于推理应用的全差分4位模拟内存计算体系结构
D. Kushwaha, Rajat Kohli, Jwalant Mishra, R. Joshi, S. Dasgupta, B. Anand
A robust, fully differential multiplication and accumulate (MAC) scheme for analog compute-in-memory (CIM) architecture is proposed in this article. The proposed method achieves a high signal margin for 4-bit CIM architecture due to fully differential voltage changes on read bit-lines (RBL/RBLBs). The signal margin achieved for 4-bit MAC operation is 32 mV, which is 1.14×, 5.82×, and 10.24× higher than the state-of-the-art. The proposed scheme is robust against the process, voltage, and temperature (PVT) variations and achieves a variability metric (σ/µ) of 3.64 %, which is 2.36× and 2.66× lower than the reported works. The architecture has achieved an energy-efficiency of 2.53 TOPS/W at 1 V supply voltage in 65 nm CMOS technology, that is 6.2× efficient than digital baseline HW [25]. Furthermore, the inference accuracy of the architecture is 97.6% on the MNIST data set with a LeNet-5 CNN model. The figure-of-merit (FoM) of the proposed design is 355, which is 3.28×, 3.58×, and 17.75× higher than state-of-the-art.
本文提出了一种鲁棒的全微分乘法累积(MAC)方案,用于模拟内存计算(CIM)体系结构。由于读位线(RBL/ rblb)上的完全差分电压变化,该方法实现了4位CIM结构的高信号裕度。在4位MAC操作中实现的信号裕度为32 mV,比最先进的高1.14倍、5.82倍和10.24倍。该方案对过程、电压和温度(PVT)变化具有鲁棒性,变异性度量(σ/µ)为3.64%,分别比现有方法低2.36倍和2.66倍。该架构在65纳米CMOS技术下,在1 V电源电压下实现了2.53 TOPS/W的能效,比数字基准HW[25]效率高6.2倍。此外,该架构在MNIST数据集上使用LeNet-5 CNN模型的推理准确率为97.6%。该方案的优点系数(FoM)为355,分别比现有方案高3.28倍、3.58倍和17.75倍。
{"title":"A Fully Differential 4-Bit Analog Compute-In-Memory Architecture for Inference Application","authors":"D. Kushwaha, Rajat Kohli, Jwalant Mishra, R. Joshi, S. Dasgupta, B. Anand","doi":"10.1109/AICAS57966.2023.10168599","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168599","url":null,"abstract":"A robust, fully differential multiplication and accumulate (MAC) scheme for analog compute-in-memory (CIM) architecture is proposed in this article. The proposed method achieves a high signal margin for 4-bit CIM architecture due to fully differential voltage changes on read bit-lines (RBL/RBLBs). The signal margin achieved for 4-bit MAC operation is 32 mV, which is 1.14×, 5.82×, and 10.24× higher than the state-of-the-art. The proposed scheme is robust against the process, voltage, and temperature (PVT) variations and achieves a variability metric (σ/µ) of 3.64 %, which is 2.36× and 2.66× lower than the reported works. The architecture has achieved an energy-efficiency of 2.53 TOPS/W at 1 V supply voltage in 65 nm CMOS technology, that is 6.2× efficient than digital baseline HW [25]. Furthermore, the inference accuracy of the architecture is 97.6% on the MNIST data set with a LeNet-5 CNN model. The figure-of-merit (FoM) of the proposed design is 355, which is 3.28×, 3.58×, and 17.75× higher than state-of-the-art.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114512651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1