首页 > 最新文献

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)最新文献

英文 中文
Enhancing Fault Resilience of QNNs by Selective Neuron Splitting 选择性神经元分裂增强qnn的故障恢复能力
Mohammad Hasan Ahmadilivani, Mahdi Taheri, J. Raik, M. Daneshtalab, M. Jenihhin
The superior performance of Deep Neural Networks (DNNs) has led to their application in various aspects of human life. Safety-critical applications are no exception and impose rigorous reliability requirements on DNNs. Quantized Neural Networks (QNNs) have emerged to tackle the complexity of DNN accelerators, however, they are more prone to reliability issues.In this paper, a recent analytical resilience assessment method is adapted for QNNs to identify critical neurons based on a Neuron Vulnerability Factor (NVF). Thereafter, a novel method for splitting the critical neurons is proposed that enables the design of a Lightweight Correction Unit (LCU) in the accelerator without redesigning its computational part.The method is validated by experiments on different QNNs and datasets. The results demonstrate that the proposed method for correcting the faults has a twice smaller overhead than a selective Triple Modular Redundancy (TMR) while achieving a similar level of fault resiliency.
深度神经网络(dnn)的优越性能使其在人类生活的各个方面得到了应用。安全关键型应用也不例外,对dnn提出了严格的可靠性要求。量化神经网络(QNNs)的出现是为了解决深度神经网络加速器的复杂性,然而,它们更容易出现可靠性问题。本文提出了一种新的基于神经元脆弱性因子(NVF)的分析弹性评估方法,用于qnn识别关键神经元。在此基础上,提出了一种分离关键神经元的新方法,使加速器中的轻量级校正单元(LCU)的设计无需重新设计其计算部分。在不同的qnn和数据集上进行了实验验证。结果表明,所提出的纠错方法的开销比选择性三模冗余(TMR)方法小两倍,同时达到相似的故障恢复水平。
{"title":"Enhancing Fault Resilience of QNNs by Selective Neuron Splitting","authors":"Mohammad Hasan Ahmadilivani, Mahdi Taheri, J. Raik, M. Daneshtalab, M. Jenihhin","doi":"10.1109/AICAS57966.2023.10168633","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168633","url":null,"abstract":"The superior performance of Deep Neural Networks (DNNs) has led to their application in various aspects of human life. Safety-critical applications are no exception and impose rigorous reliability requirements on DNNs. Quantized Neural Networks (QNNs) have emerged to tackle the complexity of DNN accelerators, however, they are more prone to reliability issues.In this paper, a recent analytical resilience assessment method is adapted for QNNs to identify critical neurons based on a Neuron Vulnerability Factor (NVF). Thereafter, a novel method for splitting the critical neurons is proposed that enables the design of a Lightweight Correction Unit (LCU) in the accelerator without redesigning its computational part.The method is validated by experiments on different QNNs and datasets. The results demonstrate that the proposed method for correcting the faults has a twice smaller overhead than a selective Triple Modular Redundancy (TMR) while achieving a similar level of fault resiliency.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132951417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal Similarity-Based Computation Reduction for Video Transformers in Edge Camera Nodes 基于时间相似度的边缘摄像机节点视频变换计算缩减
Udari De Alwis, Zhongheng Xie, Massimo Alioto
Recognizing human actions in video sequences has become an essential task in video surveillance applications. In such applications, transformer models have rapidly gained wide interest thanks to their performance. However, their advantages come at the cost of a high computational and memory cost, especially when they need to be incorporated in edge devices. In this work, temporal similarity tunnel insertion is utilized to reduce the overall computation burden in video transformer networks in action recognition tasks. Furthermore, an edge-friendly video transformer model is proposed based on temporal similarity, which substantially reduces the computation cost. Its smaller variant EMViT achieves 38% computation reduction under the UCF101 dataset, while keeping the accuracy degradation insignificant (<0.02%). Also, the larger variant CMViT reduces computation by 14% (13%) with an accuracy degradation of 2% (3%) in scaled Kinetic400 and Jester datasets.
在视频序列中识别人的行为已经成为视频监控应用中的一项重要任务。在这些应用中,变压器模型由于其性能而迅速获得了广泛的兴趣。然而,它们的优势是以高计算和内存成本为代价的,特别是当它们需要集成到边缘设备中时。在这项工作中,利用时间相似隧道插入来减少视频变压器网络在动作识别任务中的总体计算负担。在此基础上,提出了一种基于时间相似度的边缘友好型视频变压器模型,大大降低了计算量。其较小的变体EMViT在UCF101数据集下可以减少38%的计算量,同时保持精度下降不显著(<0.02%)。此外,在缩放的Kinetic400和Jester数据集中,更大的变体CMViT减少了14%(13%)的计算,精度降低了2%(3%)。
{"title":"Temporal Similarity-Based Computation Reduction for Video Transformers in Edge Camera Nodes","authors":"Udari De Alwis, Zhongheng Xie, Massimo Alioto","doi":"10.1109/AICAS57966.2023.10168610","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168610","url":null,"abstract":"Recognizing human actions in video sequences has become an essential task in video surveillance applications. In such applications, transformer models have rapidly gained wide interest thanks to their performance. However, their advantages come at the cost of a high computational and memory cost, especially when they need to be incorporated in edge devices. In this work, temporal similarity tunnel insertion is utilized to reduce the overall computation burden in video transformer networks in action recognition tasks. Furthermore, an edge-friendly video transformer model is proposed based on temporal similarity, which substantially reduces the computation cost. Its smaller variant EMViT achieves 38% computation reduction under the UCF101 dataset, while keeping the accuracy degradation insignificant (<0.02%). Also, the larger variant CMViT reduces computation by 14% (13%) with an accuracy degradation of 2% (3%) in scaled Kinetic400 and Jester datasets.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132076976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SNNOpt: An Application-Specific Design Framework for Spiking Neural Networks SNNOpt:脉冲神经网络的特定应用设计框架
Jingyu He, Ziyang Shen, Fengshi Tian, Jinbo Chen, Jie Yang, M. Sawan, Hsiang-Ting Chen, P. Bogdan, C. Tsui
We propose a systematic application-specific hardware design methodology for designing Spiking Neural Network (SNN), SNNOpt, which consists of three novel phases: 1) an Olliver-Ricci-Curvature (ORC)-based architecture-aware network partitioning, 2) a reinforcement learning mapping strategy, and 3) a Bayesian optimization algorithm for NoC design space exploration. Experimental results show that SNNOpt achieves a 47.45% less runtime and 58.64% energy savings over state-of-the-art approaches.
我们提出了一种系统的应用专用硬件设计方法,用于设计峰值神经网络(SNNOpt), SNNOpt由三个新阶段组成:1)基于ololiver - ricci -曲率(ORC)的架构感知网络划分,2)强化学习映射策略,以及3)用于NoC设计空间探索的贝叶斯优化算法。实验结果表明,SNNOpt比最先进的方法节省了47.45%的运行时间和58.64%的能源。
{"title":"SNNOpt: An Application-Specific Design Framework for Spiking Neural Networks","authors":"Jingyu He, Ziyang Shen, Fengshi Tian, Jinbo Chen, Jie Yang, M. Sawan, Hsiang-Ting Chen, P. Bogdan, C. Tsui","doi":"10.1109/AICAS57966.2023.10168605","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168605","url":null,"abstract":"We propose a systematic application-specific hardware design methodology for designing Spiking Neural Network (SNN), SNNOpt, which consists of three novel phases: 1) an Olliver-Ricci-Curvature (ORC)-based architecture-aware network partitioning, 2) a reinforcement learning mapping strategy, and 3) a Bayesian optimization algorithm for NoC design space exploration. Experimental results show that SNNOpt achieves a 47.45% less runtime and 58.64% energy savings over state-of-the-art approaches.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132385149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel Knowledge Distillation to Improve Training Accuracy of Spin-based SNN 提高自旋SNN训练精度的新知识蒸馏方法
Hanrui Li, Aijaz H. Lone, Fengshi Tian, Jie Yang, M. Sawan, Nazek El‐Atab
Spintronics-based magnetic tunnel junction (MTJ) devices have shown the ability working as both synapse and spike threshold neurons, which is perfectly suitable with the hardware implementation of spike neural network (SNN). It has the inherent advantage of high energy efficiency with ultra-low operation voltage due to its small nanometric size and low depinning current densities. However, hardware-based SNNs training always suffers a significant performance loss compared with original neural networks due to variations among devices and information deficiency as the weights map with device synaptic conductance. Knowledge distillation is a model compression and acceleration method that enables transferring the learning knowledge from a large machine learning model to a smaller model with minimal loss in performance. In this paper, we propose a novel training scheme based on spike knowledge distillation which helps improve the training performance of spin-based SNN (SSNN) model via transferring knowledge from a large CNN model. We propose novel distillation methodologies and demonstrate the effectiveness of the proposed method with detailed experiments on four datasets. The experimental results indicate that our proposed training scheme consistently improves the performance of SSNN model by a large margin.
基于自旋电子学的磁隧道结(MTJ)器件已经显示出同时作为突触和尖峰阈值神经元的能力,非常适合于尖峰神经网络(SNN)的硬件实现。由于其纳米尺寸小,脱屑电流密度小,具有高能效和超低工作电压的固有优势。然而,基于硬件的snn训练与原始神经网络相比,由于设备之间的差异以及权重与设备突触电导映射的信息不足,总是会遭受明显的性能损失。知识蒸馏是一种模型压缩和加速方法,可以将学习知识从大型机器学习模型转移到性能损失最小的小型机器学习模型。本文提出了一种新的基于spike知识蒸馏的训练方案,该方案通过从大型CNN模型中转移知识来提高基于自旋的SNN (SSNN)模型的训练性能。我们提出了新的蒸馏方法,并通过在四个数据集上的详细实验证明了所提出方法的有效性。实验结果表明,我们提出的训练方案在很大程度上持续提高了SSNN模型的性能。
{"title":"Novel Knowledge Distillation to Improve Training Accuracy of Spin-based SNN","authors":"Hanrui Li, Aijaz H. Lone, Fengshi Tian, Jie Yang, M. Sawan, Nazek El‐Atab","doi":"10.1109/AICAS57966.2023.10168575","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168575","url":null,"abstract":"Spintronics-based magnetic tunnel junction (MTJ) devices have shown the ability working as both synapse and spike threshold neurons, which is perfectly suitable with the hardware implementation of spike neural network (SNN). It has the inherent advantage of high energy efficiency with ultra-low operation voltage due to its small nanometric size and low depinning current densities. However, hardware-based SNNs training always suffers a significant performance loss compared with original neural networks due to variations among devices and information deficiency as the weights map with device synaptic conductance. Knowledge distillation is a model compression and acceleration method that enables transferring the learning knowledge from a large machine learning model to a smaller model with minimal loss in performance. In this paper, we propose a novel training scheme based on spike knowledge distillation which helps improve the training performance of spin-based SNN (SSNN) model via transferring knowledge from a large CNN model. We propose novel distillation methodologies and demonstrate the effectiveness of the proposed method with detailed experiments on four datasets. The experimental results indicate that our proposed training scheme consistently improves the performance of SSNN model by a large margin.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126632409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Column-Parallel Time-Interleaved SAR/SS ADC for Computing in Memory with 2-8bit Reconfigurable Resolution 一种用于2-8位可重构分辨率内存计算的列并行时间交错SAR/SS ADC
Yuandong Li, Li Du, Yuan Du
Computing in Memory (CiM), as a computing system with non-von Neumann architecture, has been reported as one of the most promising neural network accelerators in the future. Compared with digital-based computation, CiM uses RAM arrays to calculate and store in the analog domain, avoiding the high delay and energy consumption caused by data transfer. However, the computational results require data converters for quantization, which often limits the development of high-performance CiMs. In this work, we propose a 2-8bit reconfigurable time-interleaved hybrid ADC architecture for high-speed CiMs, including successive approximation and single-slope stages. Reconfigurability introduces a trade-off between resolution and conversion speed for ADCs in different computing scenarios. A prototype was implemented in a 55 nm CMOS technology, which occupies an area of 330μm × 13μm and consumes a power of 1.429mW at 8-bit conversion mode. With a Nyquist frequency input sampled at 350 MS/s, the SNDR and SFDR are 40.93 dB and 51.08 dB, respectively. The resultant Walden figure of merit is 44.8 fJ/conv.
内存计算(CiM)作为一种非冯·诺依曼结构的计算系统,已被报道为未来最有前途的神经网络加速器之一。与基于数字的计算相比,CiM使用RAM阵列在模拟域中进行计算和存储,避免了数据传输带来的高延迟和能耗。然而,计算结果需要数据转换器进行量化,这往往限制了高性能cim的发展。在这项工作中,我们提出了一个用于高速cim的2-8位可重构时间交错混合ADC架构,包括连续逼近和单斜率级。可重构性为adc在不同的计算场景中引入了分辨率和转换速度之间的权衡。原型机采用55 nm CMOS工艺实现,其面积为330μm × 13μm, 8位转换模式功耗为1.429mW。以350 MS/s采样频率输入奈奎斯特频率时,SNDR和SFDR分别为40.93 dB和51.08 dB。由此得出的瓦尔登功绩系数为44.8 fJ/conv。
{"title":"A Column-Parallel Time-Interleaved SAR/SS ADC for Computing in Memory with 2-8bit Reconfigurable Resolution","authors":"Yuandong Li, Li Du, Yuan Du","doi":"10.1109/AICAS57966.2023.10168604","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168604","url":null,"abstract":"Computing in Memory (CiM), as a computing system with non-von Neumann architecture, has been reported as one of the most promising neural network accelerators in the future. Compared with digital-based computation, CiM uses RAM arrays to calculate and store in the analog domain, avoiding the high delay and energy consumption caused by data transfer. However, the computational results require data converters for quantization, which often limits the development of high-performance CiMs. In this work, we propose a 2-8bit reconfigurable time-interleaved hybrid ADC architecture for high-speed CiMs, including successive approximation and single-slope stages. Reconfigurability introduces a trade-off between resolution and conversion speed for ADCs in different computing scenarios. A prototype was implemented in a 55 nm CMOS technology, which occupies an area of 330μm × 13μm and consumes a power of 1.429mW at 8-bit conversion mode. With a Nyquist frequency input sampled at 350 MS/s, the SNDR and SFDR are 40.93 dB and 51.08 dB, respectively. The resultant Walden figure of merit is 44.8 fJ/conv.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123237127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-Power Convolutional Neural Network Accelerator on FPGA 基于FPGA的低功耗卷积神经网络加速器
Kasem Khalil, Ashok Kumar V, M. Bayoumi
Convolutional Neural Network (CNN) accelerator is highly beneficial for mobile and resource-constrained devices. One of the research challenges is to design a power-economic accelerator. This paper proposes a CNN accelerator with low power consumption and acceptable performance. The proposed method uses pipelining between the used kernels for the convolution process and a shared multiplication and accumulation block. The available kernels work consequently while each one performs a different operation in sequence. The proposed method utilizes a series of operations between the kernels and memory weights to speed up the convolution process. The proposed accelerator is implemented using VHDL and FPGA Altera Arria 10 GX. The results show that the proposed method achieves 26.37 GOPS/W of energy consumption, which is lower than the existing method, with acceptable resource usage and performance. The proposed method is ideally suited for small and constrained devices.
卷积神经网络(CNN)加速器对移动设备和资源受限设备非常有利。研究的挑战之一是设计一个功率经济加速器。本文提出了一种低功耗、性能可接受的CNN加速器。该方法在卷积过程中使用核之间的流水线和共享乘法和累积块。当每个内核按顺序执行不同的操作时,可用的内核就会工作。该方法利用核和内存权值之间的一系列操作来加快卷积过程。该加速器采用VHDL和FPGA Altera Arria 10gx实现。结果表明,所提方法能耗达到26.37 GOPS/W,比现有方法低,且具有可接受的资源利用率和性能。所提出的方法非常适合小型和受限的设备。
{"title":"Low-Power Convolutional Neural Network Accelerator on FPGA","authors":"Kasem Khalil, Ashok Kumar V, M. Bayoumi","doi":"10.1109/AICAS57966.2023.10168646","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168646","url":null,"abstract":"Convolutional Neural Network (CNN) accelerator is highly beneficial for mobile and resource-constrained devices. One of the research challenges is to design a power-economic accelerator. This paper proposes a CNN accelerator with low power consumption and acceptable performance. The proposed method uses pipelining between the used kernels for the convolution process and a shared multiplication and accumulation block. The available kernels work consequently while each one performs a different operation in sequence. The proposed method utilizes a series of operations between the kernels and memory weights to speed up the convolution process. The proposed accelerator is implemented using VHDL and FPGA Altera Arria 10 GX. The results show that the proposed method achieves 26.37 GOPS/W of energy consumption, which is lower than the existing method, with acceptable resource usage and performance. The proposed method is ideally suited for small and constrained devices.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123555096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RC-GNN: Fast and Accurate Signoff Wire Delay Estimation with Customized Graph Neural Networks RC-GNN:基于自定义图神经网络的快速准确的信号线延迟估计
Linyu Zhu, Yue Gu, Xinfei Guo
As interconnect delay becomes more dominate in a timing path compared to the gate delay, accurate yet fast estimation of wire delay during the signoff stage is required. Prior machine learning-based wire delay estimation approaches either relied on tedious feature extraction processes or failed to capture the net topology information, incurring long turn around time. In this paper, we propose to leverage the power of graph neural networks (GNN) to estimate the interconnect delays during signoff. Different from other GNN-assisted timing analysis methods that were usually applied to a netlist, we harness the global message passing graph representation learning on RC graph directly to perform ultra-fast net delay estimation without requiring extra features. Furthermore, pre-processed graph features can be added to boost the estimation accuracy with slight run time penalty. Our proposed customized GNN models have been evaluated with the industrial design and compared against state of the art ML-based wire delay estimator. It shows that the proposed model outperforms the state-of-the-art ML-based signoff wire delay estimator by 4x in terms of run time while achieving similar accuracy levels.
与门延迟相比,互连延迟在时序路径中变得更加重要,因此需要在信号接收阶段准确而快速地估计线延迟。先前基于机器学习的线延迟估计方法要么依赖于繁琐的特征提取过程,要么无法捕获网络拓扑信息,从而导致较长的周转时间。在本文中,我们提出利用图神经网络(GNN)的能力来估计签名期间的互连延迟。与其他通常应用于网络列表的gnn辅助时序分析方法不同,我们直接利用RC图上的全局消息传递图表示学习来进行超快速的网络延迟估计,而不需要额外的特征。此外,可以添加预处理的图形特征来提高估计精度,同时减少运行时间损失。我们提出的定制GNN模型已经用工业设计进行了评估,并与最先进的基于ml的线延迟估计器进行了比较。它表明,所提出的模型在运行时间方面优于最先进的基于ml的签名线延迟估计器4倍,同时达到相似的精度水平。
{"title":"RC-GNN: Fast and Accurate Signoff Wire Delay Estimation with Customized Graph Neural Networks","authors":"Linyu Zhu, Yue Gu, Xinfei Guo","doi":"10.1109/AICAS57966.2023.10168562","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168562","url":null,"abstract":"As interconnect delay becomes more dominate in a timing path compared to the gate delay, accurate yet fast estimation of wire delay during the signoff stage is required. Prior machine learning-based wire delay estimation approaches either relied on tedious feature extraction processes or failed to capture the net topology information, incurring long turn around time. In this paper, we propose to leverage the power of graph neural networks (GNN) to estimate the interconnect delays during signoff. Different from other GNN-assisted timing analysis methods that were usually applied to a netlist, we harness the global message passing graph representation learning on RC graph directly to perform ultra-fast net delay estimation without requiring extra features. Furthermore, pre-processed graph features can be added to boost the estimation accuracy with slight run time penalty. Our proposed customized GNN models have been evaluated with the industrial design and compared against state of the art ML-based wire delay estimator. It shows that the proposed model outperforms the state-of-the-art ML-based signoff wire delay estimator by 4x in terms of run time while achieving similar accuracy levels.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125176141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Lightweight Convolutional Neural Network for Atrial Fibrillation Detection Using Dual-Channel Binary Features from Single-Lead Short ECG 基于单导联短心电图双通道二值特征的轻型卷积神经网络心房颤动检测
Jiahao Liu, Xinyu Liu, Liang Zhou, L. Chang, Jun Zhou
Atrial fibrillation (AF) is a prevalent cardiovascular disease in the elderly, significantly increasing the risk of stroke and heart failure, etc. While the artificial neural network (ANN) has recently demonstrated high accuracy in ECG-based AF detection, its high computation complexity makes it challenging for real-time and long-term monitoring on low-power wearable devices, which is critical for detecting paroxysmal AF. Therefore, in this work, a lightweight convolutional neural network for AF detection is proposed using a dual-channel binary features extraction technique from single-lead short ECG to achieve both high classification accuracy and low computation complexity, and evaluated on the 2017 PhysioNet/CinC Challenge dataset, the proposed method achieves 93.6% sensitivity and 0.81 F1 score for AF detection. Moreover, this design consumes only 1.83M parameters, achieving up to 27x reductions compared with prior works, and only needs 57M MACs for calculation. As a result, it is suitable for deployment in low-power wearable devices for long-term AF monitoring.
心房颤动(AF)是老年人常见的心血管疾病,显著增加中风、心力衰竭等风险。虽然人工神经网络(ANN)最近在基于ecg的AF检测中表现出了很高的准确性,但其高计算复杂性使得在低功耗可穿戴设备上进行实时和长期监测具有挑战性,这对于检测阵发性AF至关重要。因此,在本工作中,采用单导联短心电双通道二值特征提取技术,提出了一种用于AF检测的轻量级卷积神经网络,实现了高分类精度和低计算复杂度,并在2017年PhysioNet/CinC Challenge数据集上进行了评估,该方法对AF检测的灵敏度为93.6%,F1评分为0.81。此外,本设计仅消耗183 m个参数,与之前的作品相比减少了27倍,仅需要57M个mac进行计算。因此,它适合部署在低功耗可穿戴设备中进行长期AF监测。
{"title":"A Lightweight Convolutional Neural Network for Atrial Fibrillation Detection Using Dual-Channel Binary Features from Single-Lead Short ECG","authors":"Jiahao Liu, Xinyu Liu, Liang Zhou, L. Chang, Jun Zhou","doi":"10.1109/AICAS57966.2023.10168645","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168645","url":null,"abstract":"Atrial fibrillation (AF) is a prevalent cardiovascular disease in the elderly, significantly increasing the risk of stroke and heart failure, etc. While the artificial neural network (ANN) has recently demonstrated high accuracy in ECG-based AF detection, its high computation complexity makes it challenging for real-time and long-term monitoring on low-power wearable devices, which is critical for detecting paroxysmal AF. Therefore, in this work, a lightweight convolutional neural network for AF detection is proposed using a dual-channel binary features extraction technique from single-lead short ECG to achieve both high classification accuracy and low computation complexity, and evaluated on the 2017 PhysioNet/CinC Challenge dataset, the proposed method achieves 93.6% sensitivity and 0.81 F1 score for AF detection. Moreover, this design consumes only 1.83M parameters, achieving up to 27x reductions compared with prior works, and only needs 57M MACs for calculation. As a result, it is suitable for deployment in low-power wearable devices for long-term AF monitoring.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121807942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Fully Differential 4-Bit Analog Compute-In-Memory Architecture for Inference Application 用于推理应用的全差分4位模拟内存计算体系结构
D. Kushwaha, Rajat Kohli, Jwalant Mishra, R. Joshi, S. Dasgupta, B. Anand
A robust, fully differential multiplication and accumulate (MAC) scheme for analog compute-in-memory (CIM) architecture is proposed in this article. The proposed method achieves a high signal margin for 4-bit CIM architecture due to fully differential voltage changes on read bit-lines (RBL/RBLBs). The signal margin achieved for 4-bit MAC operation is 32 mV, which is 1.14×, 5.82×, and 10.24× higher than the state-of-the-art. The proposed scheme is robust against the process, voltage, and temperature (PVT) variations and achieves a variability metric (σ/µ) of 3.64 %, which is 2.36× and 2.66× lower than the reported works. The architecture has achieved an energy-efficiency of 2.53 TOPS/W at 1 V supply voltage in 65 nm CMOS technology, that is 6.2× efficient than digital baseline HW [25]. Furthermore, the inference accuracy of the architecture is 97.6% on the MNIST data set with a LeNet-5 CNN model. The figure-of-merit (FoM) of the proposed design is 355, which is 3.28×, 3.58×, and 17.75× higher than state-of-the-art.
本文提出了一种鲁棒的全微分乘法累积(MAC)方案,用于模拟内存计算(CIM)体系结构。由于读位线(RBL/ rblb)上的完全差分电压变化,该方法实现了4位CIM结构的高信号裕度。在4位MAC操作中实现的信号裕度为32 mV,比最先进的高1.14倍、5.82倍和10.24倍。该方案对过程、电压和温度(PVT)变化具有鲁棒性,变异性度量(σ/µ)为3.64%,分别比现有方法低2.36倍和2.66倍。该架构在65纳米CMOS技术下,在1 V电源电压下实现了2.53 TOPS/W的能效,比数字基准HW[25]效率高6.2倍。此外,该架构在MNIST数据集上使用LeNet-5 CNN模型的推理准确率为97.6%。该方案的优点系数(FoM)为355,分别比现有方案高3.28倍、3.58倍和17.75倍。
{"title":"A Fully Differential 4-Bit Analog Compute-In-Memory Architecture for Inference Application","authors":"D. Kushwaha, Rajat Kohli, Jwalant Mishra, R. Joshi, S. Dasgupta, B. Anand","doi":"10.1109/AICAS57966.2023.10168599","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168599","url":null,"abstract":"A robust, fully differential multiplication and accumulate (MAC) scheme for analog compute-in-memory (CIM) architecture is proposed in this article. The proposed method achieves a high signal margin for 4-bit CIM architecture due to fully differential voltage changes on read bit-lines (RBL/RBLBs). The signal margin achieved for 4-bit MAC operation is 32 mV, which is 1.14×, 5.82×, and 10.24× higher than the state-of-the-art. The proposed scheme is robust against the process, voltage, and temperature (PVT) variations and achieves a variability metric (σ/µ) of 3.64 %, which is 2.36× and 2.66× lower than the reported works. The architecture has achieved an energy-efficiency of 2.53 TOPS/W at 1 V supply voltage in 65 nm CMOS technology, that is 6.2× efficient than digital baseline HW [25]. Furthermore, the inference accuracy of the architecture is 97.6% on the MNIST data set with a LeNet-5 CNN model. The figure-of-merit (FoM) of the proposed design is 355, which is 3.28×, 3.58×, and 17.75× higher than state-of-the-art.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114512651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TPE: A High-Performance Edge-Device Inference with Multi-level Transformational Mechanism TPE:具有多层次转换机制的高性能边缘器件推理
Zhou Wang, Jingchuang Wei, Xiaonan Tang, Boxiao Han, Hongjun He, Leibo Liu, Shaojun Wei, S. Yin
DNN inference of edge devices has been very important for a long time with large computing and energy consumption demand. This paper proposes a TPE(Transformation Process Element) with three characteristics. Firstly, TPE has a method of Data Segmentation Skip and Pre-Reorganization(DSSPR). Secondly, TPE has a Typical Value Matching and Calibration Computer (TVMCC) system, which converts direct calculation into matching and calibration calculation. Thirdly, TPE includes a Data Format Pre-Configuration and Self-Adjustment (DFPCSA) scheme. Compared with the most typical pure reasoning processor UNPU, TPE achieves 1.25× better energy consumption.
长期以来,边缘设备的深度神经网络推理一直是计算量大、能耗大的重要问题。本文提出了一种具有三个特征的TPE(Transformation Process Element)。首先,TPE具有数据分割跳过和预重组(DSSPR)方法。其次,TPE具有典型的数值匹配与校准计算机(TVMCC)系统,将直接计算转换为匹配与校准计算。第三,TPE包含数据格式预配置和自调整(DFPCSA)方案。与最典型的纯推理处理器UNPU相比,TPE的能耗提高了1.25倍。
{"title":"TPE: A High-Performance Edge-Device Inference with Multi-level Transformational Mechanism","authors":"Zhou Wang, Jingchuang Wei, Xiaonan Tang, Boxiao Han, Hongjun He, Leibo Liu, Shaojun Wei, S. Yin","doi":"10.1109/AICAS57966.2023.10168614","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168614","url":null,"abstract":"DNN inference of edge devices has been very important for a long time with large computing and energy consumption demand. This paper proposes a TPE(Transformation Process Element) with three characteristics. Firstly, TPE has a method of Data Segmentation Skip and Pre-Reorganization(DSSPR). Secondly, TPE has a Typical Value Matching and Calibration Computer (TVMCC) system, which converts direct calculation into matching and calibration calculation. Thirdly, TPE includes a Data Format Pre-Configuration and Self-Adjustment (DFPCSA) scheme. Compared with the most typical pure reasoning processor UNPU, TPE achieves 1.25× better energy consumption.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114851104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1