首页 > 最新文献

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)最新文献

英文 中文
Embedded neuromorphic attention model leveraging a novel low-power heterogeneous platform 基于新型低功耗异构平台的嵌入式神经形态注意模型
Amélie Gruel, Alfio Di Mauro, Robin Hunziker, L. Benini, Jean Martinet, M. Magno
Neuromorphic computing has been identified as an ideal candidate to exploit the potential of event-based cameras, a promising sensor for embedded computer vision. However, state-of-the-art neuromorphic models try to maximize the model performance on large platforms rather than a trade-off between memory requirements and performance. We present the first deployment of an embedded neuromorphic algorithm on Kraken, a low-power RISC-V-based SoC prototype including a neuromorphic spiking neural network (SNN) accelerator. In addition, the model employed in this paper was designed to achieve visual attention detection on event data while minimizing the neuronal populations’ size and the inference latency. Experimental results show that it is possible to achieve saliency detection in event data with a delay of 32ms, maintains classification accuracy of 84.51% and consumes only 3.85mJ per second of processed input data, achieving all of this while processing input data 10 times faster than real-time. This trade-off between decision latency, power consumption, accuracy, and run time significantly outperforms those achieved by previous implementations on CPU and neuromorphic hardware.
神经形态计算已被确定为开发基于事件的相机潜力的理想候选者,这是一种有前途的嵌入式计算机视觉传感器。然而,最先进的神经形态模型试图在大型平台上最大化模型性能,而不是在内存需求和性能之间进行权衡。我们首次在Kraken上部署了嵌入式神经形态算法,Kraken是一个低功耗的基于risc - v的SoC原型,包括一个神经形态峰值神经网络(SNN)加速器。此外,本文所采用的模型旨在实现对事件数据的视觉注意检测,同时最小化神经元群体的大小和推理延迟。实验结果表明,可以在延迟32ms的情况下实现对事件数据的显著性检测,保持84.51%的分类准确率,处理输入数据时仅消耗3.85mJ / s,在处理输入数据速度比实时快10倍的情况下实现这一切。这种在决策延迟、功耗、准确性和运行时间之间的权衡,明显优于以前在CPU和神经形态硬件上实现的结果。
{"title":"Embedded neuromorphic attention model leveraging a novel low-power heterogeneous platform","authors":"Amélie Gruel, Alfio Di Mauro, Robin Hunziker, L. Benini, Jean Martinet, M. Magno","doi":"10.1109/AICAS57966.2023.10168603","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168603","url":null,"abstract":"Neuromorphic computing has been identified as an ideal candidate to exploit the potential of event-based cameras, a promising sensor for embedded computer vision. However, state-of-the-art neuromorphic models try to maximize the model performance on large platforms rather than a trade-off between memory requirements and performance. We present the first deployment of an embedded neuromorphic algorithm on Kraken, a low-power RISC-V-based SoC prototype including a neuromorphic spiking neural network (SNN) accelerator. In addition, the model employed in this paper was designed to achieve visual attention detection on event data while minimizing the neuronal populations’ size and the inference latency. Experimental results show that it is possible to achieve saliency detection in event data with a delay of 32ms, maintains classification accuracy of 84.51% and consumes only 3.85mJ per second of processed input data, achieving all of this while processing input data 10 times faster than real-time. This trade-off between decision latency, power consumption, accuracy, and run time significantly outperforms those achieved by previous implementations on CPU and neuromorphic hardware.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121671276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating Delta Modulation and Stochastic Computing for Real-time Machine Learning based Heartbeats Monitoring in Wearable Systems 集成增量调制和随机计算的可穿戴系统实时机器学习心跳监测
Xiaochen Tang, Shanshan Liu, Farzad Niknia, Wei Tang, P. Reviriego, Fabrizio Lombardi
Real-time electrocardiogram (ECG) monitoring using wearable devices is crucial for early cardiovascular disease diagnosis and by using machine learning (ML) algorithms, it can be automated. Unfortunately, wearable devices face stringent hardware resource constraints, and thus low-complexity designs that can implement ML-based detection of heartbeat anomalies are required. This paper proposes the integration of a delta modulator (DM) used to digitize the ECG signal with a Stochastic Computing (SC) implementation of the ML algorithms. The DM enables a low-cost conversion of the ECG to binary sequences that are then directly processed in the SC implementation of an ML algorithm. This eliminates the need of converting the DM outputs to integers and then to stochastic sequences and thus the proposed integrated design considerably reduces the complexity of the system. The proposed scheme has been evaluated on a premature ventricular contraction (PVC) heartbeat recognition system based on a support vector machine classifier. The estimated chip area and power dissipation of the proposed system using a commercial 180nm CMOS technology are 0.36 mm2 and 0.6 µW, respectively, so achieving more than 38% and 54% reduction in these metrics compared to state-of-the-art solutions while providing similar performance in terms of heartbeat anomaly detection.
使用可穿戴设备进行实时心电图(ECG)监测对于早期心血管疾病诊断至关重要,通过使用机器学习(ML)算法,可以实现自动化。不幸的是,可穿戴设备面临严格的硬件资源限制,因此需要能够实现基于ml的心跳异常检测的低复杂度设计。本文提出将用于心电信号数字化的增量调制器(DM)与ML算法的随机计算(SC)实现相结合。DM可以低成本地将ECG转换为二进制序列,然后在ML算法的SC实现中直接处理。这消除了将DM输出转换为整数然后转换为随机序列的需要,因此所提出的集成设计大大降低了系统的复杂性。在基于支持向量机分类器的室性早搏识别系统中对该方法进行了验证。该系统采用商用180nm CMOS技术,估计芯片面积和功耗分别为0.36 mm2和0.6µW,因此与最先进的解决方案相比,这些指标分别减少了38%和54%,同时在心跳异常检测方面提供了类似的性能。
{"title":"Integrating Delta Modulation and Stochastic Computing for Real-time Machine Learning based Heartbeats Monitoring in Wearable Systems","authors":"Xiaochen Tang, Shanshan Liu, Farzad Niknia, Wei Tang, P. Reviriego, Fabrizio Lombardi","doi":"10.1109/AICAS57966.2023.10168665","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168665","url":null,"abstract":"Real-time electrocardiogram (ECG) monitoring using wearable devices is crucial for early cardiovascular disease diagnosis and by using machine learning (ML) algorithms, it can be automated. Unfortunately, wearable devices face stringent hardware resource constraints, and thus low-complexity designs that can implement ML-based detection of heartbeat anomalies are required. This paper proposes the integration of a delta modulator (DM) used to digitize the ECG signal with a Stochastic Computing (SC) implementation of the ML algorithms. The DM enables a low-cost conversion of the ECG to binary sequences that are then directly processed in the SC implementation of an ML algorithm. This eliminates the need of converting the DM outputs to integers and then to stochastic sequences and thus the proposed integrated design considerably reduces the complexity of the system. The proposed scheme has been evaluated on a premature ventricular contraction (PVC) heartbeat recognition system based on a support vector machine classifier. The estimated chip area and power dissipation of the proposed system using a commercial 180nm CMOS technology are 0.36 mm2 and 0.6 µW, respectively, so achieving more than 38% and 54% reduction in these metrics compared to state-of-the-art solutions while providing similar performance in terms of heartbeat anomaly detection.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126831662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
E-Track: Eye Tracking with Event Camera for Extended Reality (XR) Applications E-Track:眼动追踪与事件相机扩展现实(XR)应用
Nealson Li, Ashwin Bhat, A. Raychowdhury
Eye tracking is an essential functionality to enable extended reality (XR) applications. However, the latency and power constraints of an XR headset are tight. Unlike fix-rate frame-based RGB cameras, the event camera senses brightness changes and generates asynchronous sparse events with high temporal resolution. Although the event camera exhibits suitable characteristics for eye tracking in XR systems, processing an event-based data stream is a challenging task. In this paper, we present an event-based eye-tracking system that extracts pupil features. It is the first system that operates only with an event camera and requires no additional sensing hardware. We first propose an event-to-frame conversion method that encodes the events triggered by eye motion into a 3-channel frame. Secondly, we train a Convolutional Neural Network (CNN) on 24 subjects to classify the events representing the pupil. Finally, we employ a region of interest (RoI) mechanism that tracks pupil location and reduces the amount of CNN inference by 96%. Our eye-tracking pipeline is able to locate the pupil with an error of 3.68 pixels at 160 mW system power.
眼动追踪是实现扩展现实(XR)应用程序的基本功能。然而,XR耳机的延迟和功率限制很严格。与基于固定速率帧的RGB相机不同,事件相机可以感知亮度变化并生成具有高时间分辨率的异步稀疏事件。虽然事件相机在XR系统中具有适合眼动追踪的特性,但处理基于事件的数据流是一项具有挑战性的任务。本文提出了一种基于事件的瞳孔特征提取眼动追踪系统。这是第一个只使用事件摄像头,不需要额外传感硬件的系统。我们首先提出了一种事件到帧的转换方法,该方法将眼动触发的事件编码为3通道帧。其次,我们在24个主题上训练卷积神经网络(CNN)来对代表瞳孔的事件进行分类。最后,我们采用感兴趣区域(RoI)机制来跟踪瞳孔位置,并将CNN推理量减少96%。我们的眼动追踪管道能够在160兆瓦的系统功率下以3.68像素的误差定位瞳孔。
{"title":"E-Track: Eye Tracking with Event Camera for Extended Reality (XR) Applications","authors":"Nealson Li, Ashwin Bhat, A. Raychowdhury","doi":"10.1109/AICAS57966.2023.10168551","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168551","url":null,"abstract":"Eye tracking is an essential functionality to enable extended reality (XR) applications. However, the latency and power constraints of an XR headset are tight. Unlike fix-rate frame-based RGB cameras, the event camera senses brightness changes and generates asynchronous sparse events with high temporal resolution. Although the event camera exhibits suitable characteristics for eye tracking in XR systems, processing an event-based data stream is a challenging task. In this paper, we present an event-based eye-tracking system that extracts pupil features. It is the first system that operates only with an event camera and requires no additional sensing hardware. We first propose an event-to-frame conversion method that encodes the events triggered by eye motion into a 3-channel frame. Secondly, we train a Convolutional Neural Network (CNN) on 24 subjects to classify the events representing the pupil. Finally, we employ a region of interest (RoI) mechanism that tracks pupil location and reduces the amount of CNN inference by 96%. Our eye-tracking pipeline is able to locate the pupil with an error of 3.68 pixels at 160 mW system power.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128137494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mapping-aware Biased Training for Accurate Memristor-based Neural Networks 基于记忆阻器的精确神经网络的映射感知偏差训练
Sumit Diware, A. Gebregiorgis, R. Joshi, S. Hamdioui, R. Bishnoi
Memristor-based computation-in-memory (CIM) can achieve high energy efficiency by processing the data within the memory, which makes it well-suited for applications like neural networks. However, memristors suffer from conductance variation problem where their programmed conductance values deviate from the desired values. Such variations lead to computational errors that result in degraded inference accuracy in CIM-based neural networks. In this paper, we present a mapping-aware biased training methodology to mitigate the impact of conductance variation on CIM-based neural networks. We first determine which conductance states of the memristor are inherently more immune to variation. The neural network is then trained under the constraint that important weights can only take numeric values which directly get mapped to such favorable states. Simulation results show that our proposed mapping-aware biased training achieves up to 2.4× hardware accuracy compared to the conventional training.
基于忆阻器的内存计算(CIM)可以通过处理内存中的数据来实现高能效,这使得它非常适合神经网络等应用。然而,忆阻器存在电导变化问题,即它们的程序电导值偏离期望值。这种变化会导致计算误差,从而导致基于cim的神经网络的推理精度下降。在本文中,我们提出了一种映射感知偏置训练方法,以减轻电导变化对基于cim的神经网络的影响。我们首先确定记忆电阻器的哪些电导状态本质上更不受变化的影响。然后在重要权重只能取直接映射到这种有利状态的数值的约束下训练神经网络。仿真结果表明,与传统训练相比,我们提出的映射感知偏置训练的硬件精度高达2.4倍。
{"title":"Mapping-aware Biased Training for Accurate Memristor-based Neural Networks","authors":"Sumit Diware, A. Gebregiorgis, R. Joshi, S. Hamdioui, R. Bishnoi","doi":"10.1109/AICAS57966.2023.10168661","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168661","url":null,"abstract":"Memristor-based computation-in-memory (CIM) can achieve high energy efficiency by processing the data within the memory, which makes it well-suited for applications like neural networks. However, memristors suffer from conductance variation problem where their programmed conductance values deviate from the desired values. Such variations lead to computational errors that result in degraded inference accuracy in CIM-based neural networks. In this paper, we present a mapping-aware biased training methodology to mitigate the impact of conductance variation on CIM-based neural networks. We first determine which conductance states of the memristor are inherently more immune to variation. The neural network is then trained under the constraint that important weights can only take numeric values which directly get mapped to such favorable states. Simulation results show that our proposed mapping-aware biased training achieves up to 2.4× hardware accuracy compared to the conventional training.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127357029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SpatialHD: Spatial Transformer Fused with Hyperdimensional Computing for AI Applications SpatialHD: AI应用的空间变压器与超维计算融合
M. Bettayeb, Eman Hassan, Baker Mohammad, H. Saleh
Brain-inspired computing methods have shown remarkable efficiency and robustness compared to deep neural networks (DNN). In particular, HyperDimensional Computing (HDC) and Vision Transformer (ViT) have demonstrated promising achievements in facilitating effective and reliable cognitive learning. This paper proposes SpatialHD, the first framework that combines spatial transformer networks (STN) and HDC. First, SpatialHD exploits the STN, which explicitly allows the spatial manipulation of data within the network. Then, it employs HDC to operate over STN output by mapping feature maps into high-dimensional space, learning abstracted information, and classifying data. In addition, the STN output is resized to generate a smaller input feature map. This further reduces computing complexity and memory storage compared to HDC alone. Finally, to test the model’s functionality, we applied spatial HD for image classification, utilizing the MNIST and Fashion-MNIST datasets, using only 25% of the dataset for training. Our results show that SpatialHD improves accuracy by ≈ 8% and enhances efficiency by approximately 2.5x compared to base-HDC.
与深度神经网络(DNN)相比,脑启发计算方法显示出显著的效率和鲁棒性。特别是,超维计算(HDC)和视觉转换器(ViT)在促进有效和可靠的认知学习方面取得了可喜的成就。本文提出了首个结合空间变压器网络(STN)和HDC的框架SpatialHD。首先,SpatialHD利用了STN,它明确允许在网络中对数据进行空间操作。然后,利用HDC对STN输出进行操作,将特征映射映射到高维空间,学习抽象信息,并对数据进行分类。此外,STN输出被调整大小以生成更小的输入特征映射。与单独的HDC相比,这进一步降低了计算复杂性和内存存储。最后,为了测试模型的功能,我们利用MNIST和Fashion-MNIST数据集,仅使用25%的数据集进行训练,将空间高清应用于图像分类。我们的研究结果表明,与base-HDC相比,SpatialHD的精度提高了约8%,效率提高了约2.5倍。
{"title":"SpatialHD: Spatial Transformer Fused with Hyperdimensional Computing for AI Applications","authors":"M. Bettayeb, Eman Hassan, Baker Mohammad, H. Saleh","doi":"10.1109/AICAS57966.2023.10168629","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168629","url":null,"abstract":"Brain-inspired computing methods have shown remarkable efficiency and robustness compared to deep neural networks (DNN). In particular, HyperDimensional Computing (HDC) and Vision Transformer (ViT) have demonstrated promising achievements in facilitating effective and reliable cognitive learning. This paper proposes SpatialHD, the first framework that combines spatial transformer networks (STN) and HDC. First, SpatialHD exploits the STN, which explicitly allows the spatial manipulation of data within the network. Then, it employs HDC to operate over STN output by mapping feature maps into high-dimensional space, learning abstracted information, and classifying data. In addition, the STN output is resized to generate a smaller input feature map. This further reduces computing complexity and memory storage compared to HDC alone. Finally, to test the model’s functionality, we applied spatial HD for image classification, utilizing the MNIST and Fashion-MNIST datasets, using only 25% of the dataset for training. Our results show that SpatialHD improves accuracy by ≈ 8% and enhances efficiency by approximately 2.5x compared to base-HDC.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130591481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bit-Offsetter: A Bit-serial DNN Accelerator with Weight-offset MAC for Bit-wise Sparsity Exploitation 位偏移器:位串行DNN加速器,具有用于逐位稀疏性开发的权重偏移MAC
Siqi He, Hongyi Zhang, Mengjie Li, Haozhe Zhu, Chixiao Chen, Qi Liu, Xiaoyang Zeng
With the rapid evolution of deep neural networks (DNNs), the massive computational burden brings about the difficulty of deploying DNN on edge devices. This situation gives rise to specialized hardware aiming at exploiting the sparsity of DNN parameters. Bit-serial architectures (BSAs) possess great performance potential by leveraging the abundant bit-wise sparsity. However, the distribution of effective bits of weights confines the performance of BSA designs. To improve the efficiency of BSA, we propose a weight-offset multiply-accumulation (MAC) scheme and an associated hardware design called Bit-offsetter in this paper. Weight-offsetting not only significantly boosts bit-wise sparsity but also brings out a more balanced distribution of essential bits. For Bit-offsetter, aside from leveraging the abundant bitwise sparsity induced by weight-offsetting, it’s also equipped with a load-balancing scheduler to reduce idle cycles and mitigate utilization degradation. According to our experiment on a series of DNN models, weight-offsetting can increase bit-wise sparsity for pre-trained weight up to 77.4% on average. The weight-offset MAC scheme associated with Bit-offsetter achieves 3.28×/2.94× speedup/energy efficiency over the baseline.
随着深度神经网络的快速发展,巨大的计算负担给在边缘设备上部署深度神经网络带来了困难。这种情况产生了专门的硬件,旨在利用深度神经网络参数的稀疏性。位串行体系结构(BSAs)利用了丰富的逐位稀疏性,具有巨大的性能潜力。然而,有效位权重的分布限制了BSA设计的性能。为了提高BSA的效率,本文提出了一种权重偏移乘积累(MAC)方案和一种相关的硬件设计,称为位偏移。权重偏移不仅显著提高了比特稀疏性,而且使基本比特的分布更加均衡。对于位偏移,除了利用由权重偏移引起的丰富的位稀疏性外,它还配备了负载平衡调度器,以减少空闲周期并减轻利用率下降。根据我们对一系列DNN模型的实验,权重偏移可以将预训练权重的按位稀疏度平均提高77.4%。与Bit-offsetter相关的权重偏移MAC方案在基线上实现了3.28×/2.94×的加速/能效。
{"title":"Bit-Offsetter: A Bit-serial DNN Accelerator with Weight-offset MAC for Bit-wise Sparsity Exploitation","authors":"Siqi He, Hongyi Zhang, Mengjie Li, Haozhe Zhu, Chixiao Chen, Qi Liu, Xiaoyang Zeng","doi":"10.1109/AICAS57966.2023.10168618","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168618","url":null,"abstract":"With the rapid evolution of deep neural networks (DNNs), the massive computational burden brings about the difficulty of deploying DNN on edge devices. This situation gives rise to specialized hardware aiming at exploiting the sparsity of DNN parameters. Bit-serial architectures (BSAs) possess great performance potential by leveraging the abundant bit-wise sparsity. However, the distribution of effective bits of weights confines the performance of BSA designs. To improve the efficiency of BSA, we propose a weight-offset multiply-accumulation (MAC) scheme and an associated hardware design called Bit-offsetter in this paper. Weight-offsetting not only significantly boosts bit-wise sparsity but also brings out a more balanced distribution of essential bits. For Bit-offsetter, aside from leveraging the abundant bitwise sparsity induced by weight-offsetting, it’s also equipped with a load-balancing scheduler to reduce idle cycles and mitigate utilization degradation. According to our experiment on a series of DNN models, weight-offsetting can increase bit-wise sparsity for pre-trained weight up to 77.4% on average. The weight-offset MAC scheme associated with Bit-offsetter achieves 3.28×/2.94× speedup/energy efficiency over the baseline.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130644623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Live Demonstration: Efficient Organic Photodetector based Active Matrix Imager for Real-time Optical Character Recognition 现场演示:用于实时光学字符识别的高效有机光电探测器主动矩阵成像仪
Tong Shan, Jun Li, Xiao Hou, Peijin Huang, X. Guo
This live demonstration presents real-time character recognition using a portable system composed of an organic photodetectors-based imaging array and a smartphone. The high specific detectivity of the organic photodiode enables sensitive imaging with an ultra-low light intensity. Furthermore, a smartphone application using deep learning-based algorithm training has been applied for character recognition.
这个现场演示展示了使用一个便携式系统的实时字符识别,该系统由基于有机光电探测器的成像阵列和智能手机组成。有机光电二极管的高比探测性使得超低光强的敏感成像成为可能。此外,使用基于深度学习的算法训练的智能手机应用程序已被用于字符识别。
{"title":"Live Demonstration: Efficient Organic Photodetector based Active Matrix Imager for Real-time Optical Character Recognition","authors":"Tong Shan, Jun Li, Xiao Hou, Peijin Huang, X. Guo","doi":"10.1109/AICAS57966.2023.10168609","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168609","url":null,"abstract":"This live demonstration presents real-time character recognition using a portable system composed of an organic photodetectors-based imaging array and a smartphone. The high specific detectivity of the organic photodiode enables sensitive imaging with an ultra-low light intensity. Furthermore, a smartphone application using deep learning-based algorithm training has been applied for character recognition.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131410743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SLIM-Net: Rethinking how neural networks use systolic arrays SLIM-Net:重新思考神经网络如何使用收缩数组
T. Dalgaty, Maria Lepecq
Systolic arrays of processing elements are widely used to massively parallelise neural network layers. However, the execution of traditional convolutional and fully-connected layers on such hardware typically requires a non-negligible latency to distribute data over the array before each operation - data is not immediately in-place. This arises from the fundamental incompatibility between the physical spatial nature of a systolic array and the un-physical form of existing neural networks. We propose the systolic lateral mixer network (SLIM-Net) in an effort to reconcile this mismatch. The architecture of SLIM-Net maps directly onto the physical structure of a systolic array such that, after evaluating one layer, data immediately finds itself where it needs to be to begin the next. To evaluate the potential of SLIM-Net we compare it to a UNet model on a COCO segmentation task and find that, for models of equivalent size, SLIM-Net not only achieves a slightly better performance but requires almost an order of magnitude fewer MAC operations. Furthermore, we implement a lateral mixing layer on a systolic smart imager chip which executes seven times faster than similar convolutional layers on the same hardware and provides encouraging initial insights into the practicality of this new neuromorphic approach.
处理元素的收缩阵列被广泛用于大规模并行化神经网络层。然而,在这种硬件上执行传统的卷积层和全连接层通常需要一个不可忽略的延迟,以便在每次操作之前将数据分发到数组上——数据不是立即到位的。这源于收缩阵列的物理空间性质与现有神经网络的非物理形式之间的根本不相容。我们提出了收缩侧混合器网络(SLIM-Net),以努力调和这种不匹配。SLIM-Net的架构直接映射到收缩数组的物理结构,这样,在评估一层后,数据立即找到开始下一层所需的位置。为了评估SLIM-Net的潜力,我们将其与COCO分割任务上的UNet模型进行比较,发现对于同等大小的模型,SLIM-Net不仅实现了稍好的性能,而且需要的MAC操作几乎减少了一个数量级。此外,我们在收缩智能成像仪芯片上实现了一个横向混合层,其执行速度比相同硬件上的类似卷积层快7倍,并为这种新的神经形态方法的实用性提供了令人鼓舞的初步见解。
{"title":"SLIM-Net: Rethinking how neural networks use systolic arrays","authors":"T. Dalgaty, Maria Lepecq","doi":"10.1109/AICAS57966.2023.10168580","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168580","url":null,"abstract":"Systolic arrays of processing elements are widely used to massively parallelise neural network layers. However, the execution of traditional convolutional and fully-connected layers on such hardware typically requires a non-negligible latency to distribute data over the array before each operation - data is not immediately in-place. This arises from the fundamental incompatibility between the physical spatial nature of a systolic array and the un-physical form of existing neural networks. We propose the systolic lateral mixer network (SLIM-Net) in an effort to reconcile this mismatch. The architecture of SLIM-Net maps directly onto the physical structure of a systolic array such that, after evaluating one layer, data immediately finds itself where it needs to be to begin the next. To evaluate the potential of SLIM-Net we compare it to a UNet model on a COCO segmentation task and find that, for models of equivalent size, SLIM-Net not only achieves a slightly better performance but requires almost an order of magnitude fewer MAC operations. Furthermore, we implement a lateral mixing layer on a systolic smart imager chip which executes seven times faster than similar convolutional layers on the same hardware and provides encouraging initial insights into the practicality of this new neuromorphic approach.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"38 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132462052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FrameFire: Enabling Efficient Spiking Neural Network Inference for Video Segmentation FrameFire:为视频分割启用高效尖峰神经网络推理
Qinyu Chen, Congyi Sun, Chang Gao, X. Fang, H. Luan
Fast video recognition is essential for real-time scenarios, e.g., autonomous driving. However, applying existing Deep Neural Networks (DNNs) to individual high-resolution images is expensive due to large model sizes. Spiking Neural Networks (SNNs) are developed as a promising alternative to DNNs due to their more realistic brain-inspired computing models. SNNs have sparse neuron firing over time, i.e., spatio-temporal sparsity; thus they are useful to enable energy-efficient computation. However, exploiting the spatio-temporal sparsity of SNNs in hardware leads to unpredictable and unbalanced workloads, degrading energy efficiency. In this work, we, therefore, propose an SNN accelerator called FrameFire for efficient video processing. We introduce a Keyframe-dominated Workload Balance Schedule (KWBS) method. It accelerates the image recognition network with sparse keyframes, then records and analyzes the current workload distribution on hardware to facilitate scheduling workloads in subsequent frames. FrameFire is implemented on a Xilinx XC7Z035 FPGA and verified by video segmentation tasks. The results show that the throughput is improved by 1.7× with the KWBS method. FrameFire achieved 1.04 KFPS throughput and 1.15 mJ/frame recognition energy.
快速视频识别对于实时场景至关重要,例如自动驾驶。然而,由于模型尺寸大,将现有的深度神经网络(dnn)应用于单个高分辨率图像是昂贵的。脉冲神经网络(snn)由于其更现实的大脑启发计算模型而成为dnn的一种有前途的替代品。snn随着时间的推移具有稀疏的神经元放电,即时空稀疏性;因此,它们有助于实现节能计算。然而,在硬件中利用snn的时空稀疏性会导致不可预测和不平衡的工作负载,从而降低能源效率。因此,在这项工作中,我们提出了一种称为FrameFire的SNN加速器,用于高效的视频处理。提出了一种以关键帧为主导的工作负载平衡调度方法。它利用稀疏的关键帧对图像识别网络进行加速,然后记录和分析当前硬件上的工作负载分布,以便在后续帧中调度工作负载。FrameFire在Xilinx XC7Z035 FPGA上实现,并通过视频分割任务进行验证。结果表明,采用KWBS方法可将吞吐量提高1.7倍。FrameFire的吞吐量为1.04 KFPS,识别能量为1.15 mJ/帧。
{"title":"FrameFire: Enabling Efficient Spiking Neural Network Inference for Video Segmentation","authors":"Qinyu Chen, Congyi Sun, Chang Gao, X. Fang, H. Luan","doi":"10.1109/AICAS57966.2023.10168660","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168660","url":null,"abstract":"Fast video recognition is essential for real-time scenarios, e.g., autonomous driving. However, applying existing Deep Neural Networks (DNNs) to individual high-resolution images is expensive due to large model sizes. Spiking Neural Networks (SNNs) are developed as a promising alternative to DNNs due to their more realistic brain-inspired computing models. SNNs have sparse neuron firing over time, i.e., spatio-temporal sparsity; thus they are useful to enable energy-efficient computation. However, exploiting the spatio-temporal sparsity of SNNs in hardware leads to unpredictable and unbalanced workloads, degrading energy efficiency. In this work, we, therefore, propose an SNN accelerator called FrameFire for efficient video processing. We introduce a Keyframe-dominated Workload Balance Schedule (KWBS) method. It accelerates the image recognition network with sparse keyframes, then records and analyzes the current workload distribution on hardware to facilitate scheduling workloads in subsequent frames. FrameFire is implemented on a Xilinx XC7Z035 FPGA and verified by video segmentation tasks. The results show that the throughput is improved by 1.7× with the KWBS method. FrameFire achieved 1.04 KFPS throughput and 1.15 mJ/frame recognition energy.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132410618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Energy-Efficient and Reconfigurable CNN Accelerator Applied To Lung Cancer Detection 一种用于肺癌检测的节能可重构CNN加速器
Yi Hsin Liao, Hsin Chen, K. Tang, Shu You Lin, Ding Xiao Wu, Yu-Chiao Chen, Hong Wen Luo
We propose a system to fast and easily detect lung cancer by breathing into the device, which is not invasive. Some particular substances only exist in lung cancer patients' breathing. Based on this, we use the CNN model to extract the feature in the gas exhaled by the testee. Then, the neural network will give out the prediction of lung cancer. To accelerate the computation of CNN, we design a hardware accelerator and implement it with FPGA (Field Programmable Gate Array). By comparing the performance, like power consumption and energy efficiency of different architectures, we could find the most appropriate architecture for us. Ultimately, we could reduce memory access by about 20% and reduce 12% of the energy consumption, achieving low power at edge devices. The performance of the CNN model is with a training accuracy 88.41%, a testing accuracy 85.29%, a false negative rate 5.8%, and a false positive rate 41.17%
我们提出了一种系统,可以快速,方便地检测肺癌,通过呼吸的设备,这是无创的。一些特殊物质只存在于肺癌患者的呼吸中。在此基础上,我们使用CNN模型提取被测者呼出气体中的特征。然后,利用神经网络对肺癌进行预测。为了加快CNN的计算速度,我们设计了一个硬件加速器,并用FPGA(现场可编程门阵列)实现。通过比较不同架构的性能,如功耗和能源效率,我们可以找到最适合我们的架构。最终,我们可以减少约20%的内存访问,减少12%的能耗,实现边缘设备的低功耗。CNN模型的训练准确率为88.41%,测试准确率为85.29%,假阴性率为5.8%,假阳性率为41.17%
{"title":"An Energy-Efficient and Reconfigurable CNN Accelerator Applied To Lung Cancer Detection","authors":"Yi Hsin Liao, Hsin Chen, K. Tang, Shu You Lin, Ding Xiao Wu, Yu-Chiao Chen, Hong Wen Luo","doi":"10.1109/AICAS57966.2023.10168583","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168583","url":null,"abstract":"We propose a system to fast and easily detect lung cancer by breathing into the device, which is not invasive. Some particular substances only exist in lung cancer patients' breathing. Based on this, we use the CNN model to extract the feature in the gas exhaled by the testee. Then, the neural network will give out the prediction of lung cancer. To accelerate the computation of CNN, we design a hardware accelerator and implement it with FPGA (Field Programmable Gate Array). By comparing the performance, like power consumption and energy efficiency of different architectures, we could find the most appropriate architecture for us. Ultimately, we could reduce memory access by about 20% and reduce 12% of the energy consumption, achieving low power at edge devices. The performance of the CNN model is with a training accuracy 88.41%, a testing accuracy 85.29%, a false negative rate 5.8%, and a false positive rate 41.17%","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132363663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1