Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168603
Amélie Gruel, Alfio Di Mauro, Robin Hunziker, L. Benini, Jean Martinet, M. Magno
Neuromorphic computing has been identified as an ideal candidate to exploit the potential of event-based cameras, a promising sensor for embedded computer vision. However, state-of-the-art neuromorphic models try to maximize the model performance on large platforms rather than a trade-off between memory requirements and performance. We present the first deployment of an embedded neuromorphic algorithm on Kraken, a low-power RISC-V-based SoC prototype including a neuromorphic spiking neural network (SNN) accelerator. In addition, the model employed in this paper was designed to achieve visual attention detection on event data while minimizing the neuronal populations’ size and the inference latency. Experimental results show that it is possible to achieve saliency detection in event data with a delay of 32ms, maintains classification accuracy of 84.51% and consumes only 3.85mJ per second of processed input data, achieving all of this while processing input data 10 times faster than real-time. This trade-off between decision latency, power consumption, accuracy, and run time significantly outperforms those achieved by previous implementations on CPU and neuromorphic hardware.
{"title":"Embedded neuromorphic attention model leveraging a novel low-power heterogeneous platform","authors":"Amélie Gruel, Alfio Di Mauro, Robin Hunziker, L. Benini, Jean Martinet, M. Magno","doi":"10.1109/AICAS57966.2023.10168603","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168603","url":null,"abstract":"Neuromorphic computing has been identified as an ideal candidate to exploit the potential of event-based cameras, a promising sensor for embedded computer vision. However, state-of-the-art neuromorphic models try to maximize the model performance on large platforms rather than a trade-off between memory requirements and performance. We present the first deployment of an embedded neuromorphic algorithm on Kraken, a low-power RISC-V-based SoC prototype including a neuromorphic spiking neural network (SNN) accelerator. In addition, the model employed in this paper was designed to achieve visual attention detection on event data while minimizing the neuronal populations’ size and the inference latency. Experimental results show that it is possible to achieve saliency detection in event data with a delay of 32ms, maintains classification accuracy of 84.51% and consumes only 3.85mJ per second of processed input data, achieving all of this while processing input data 10 times faster than real-time. This trade-off between decision latency, power consumption, accuracy, and run time significantly outperforms those achieved by previous implementations on CPU and neuromorphic hardware.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121671276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Real-time electrocardiogram (ECG) monitoring using wearable devices is crucial for early cardiovascular disease diagnosis and by using machine learning (ML) algorithms, it can be automated. Unfortunately, wearable devices face stringent hardware resource constraints, and thus low-complexity designs that can implement ML-based detection of heartbeat anomalies are required. This paper proposes the integration of a delta modulator (DM) used to digitize the ECG signal with a Stochastic Computing (SC) implementation of the ML algorithms. The DM enables a low-cost conversion of the ECG to binary sequences that are then directly processed in the SC implementation of an ML algorithm. This eliminates the need of converting the DM outputs to integers and then to stochastic sequences and thus the proposed integrated design considerably reduces the complexity of the system. The proposed scheme has been evaluated on a premature ventricular contraction (PVC) heartbeat recognition system based on a support vector machine classifier. The estimated chip area and power dissipation of the proposed system using a commercial 180nm CMOS technology are 0.36 mm2 and 0.6 µW, respectively, so achieving more than 38% and 54% reduction in these metrics compared to state-of-the-art solutions while providing similar performance in terms of heartbeat anomaly detection.
{"title":"Integrating Delta Modulation and Stochastic Computing for Real-time Machine Learning based Heartbeats Monitoring in Wearable Systems","authors":"Xiaochen Tang, Shanshan Liu, Farzad Niknia, Wei Tang, P. Reviriego, Fabrizio Lombardi","doi":"10.1109/AICAS57966.2023.10168665","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168665","url":null,"abstract":"Real-time electrocardiogram (ECG) monitoring using wearable devices is crucial for early cardiovascular disease diagnosis and by using machine learning (ML) algorithms, it can be automated. Unfortunately, wearable devices face stringent hardware resource constraints, and thus low-complexity designs that can implement ML-based detection of heartbeat anomalies are required. This paper proposes the integration of a delta modulator (DM) used to digitize the ECG signal with a Stochastic Computing (SC) implementation of the ML algorithms. The DM enables a low-cost conversion of the ECG to binary sequences that are then directly processed in the SC implementation of an ML algorithm. This eliminates the need of converting the DM outputs to integers and then to stochastic sequences and thus the proposed integrated design considerably reduces the complexity of the system. The proposed scheme has been evaluated on a premature ventricular contraction (PVC) heartbeat recognition system based on a support vector machine classifier. The estimated chip area and power dissipation of the proposed system using a commercial 180nm CMOS technology are 0.36 mm2 and 0.6 µW, respectively, so achieving more than 38% and 54% reduction in these metrics compared to state-of-the-art solutions while providing similar performance in terms of heartbeat anomaly detection.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126831662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168551
Nealson Li, Ashwin Bhat, A. Raychowdhury
Eye tracking is an essential functionality to enable extended reality (XR) applications. However, the latency and power constraints of an XR headset are tight. Unlike fix-rate frame-based RGB cameras, the event camera senses brightness changes and generates asynchronous sparse events with high temporal resolution. Although the event camera exhibits suitable characteristics for eye tracking in XR systems, processing an event-based data stream is a challenging task. In this paper, we present an event-based eye-tracking system that extracts pupil features. It is the first system that operates only with an event camera and requires no additional sensing hardware. We first propose an event-to-frame conversion method that encodes the events triggered by eye motion into a 3-channel frame. Secondly, we train a Convolutional Neural Network (CNN) on 24 subjects to classify the events representing the pupil. Finally, we employ a region of interest (RoI) mechanism that tracks pupil location and reduces the amount of CNN inference by 96%. Our eye-tracking pipeline is able to locate the pupil with an error of 3.68 pixels at 160 mW system power.
{"title":"E-Track: Eye Tracking with Event Camera for Extended Reality (XR) Applications","authors":"Nealson Li, Ashwin Bhat, A. Raychowdhury","doi":"10.1109/AICAS57966.2023.10168551","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168551","url":null,"abstract":"Eye tracking is an essential functionality to enable extended reality (XR) applications. However, the latency and power constraints of an XR headset are tight. Unlike fix-rate frame-based RGB cameras, the event camera senses brightness changes and generates asynchronous sparse events with high temporal resolution. Although the event camera exhibits suitable characteristics for eye tracking in XR systems, processing an event-based data stream is a challenging task. In this paper, we present an event-based eye-tracking system that extracts pupil features. It is the first system that operates only with an event camera and requires no additional sensing hardware. We first propose an event-to-frame conversion method that encodes the events triggered by eye motion into a 3-channel frame. Secondly, we train a Convolutional Neural Network (CNN) on 24 subjects to classify the events representing the pupil. Finally, we employ a region of interest (RoI) mechanism that tracks pupil location and reduces the amount of CNN inference by 96%. Our eye-tracking pipeline is able to locate the pupil with an error of 3.68 pixels at 160 mW system power.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128137494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168661
Sumit Diware, A. Gebregiorgis, R. Joshi, S. Hamdioui, R. Bishnoi
Memristor-based computation-in-memory (CIM) can achieve high energy efficiency by processing the data within the memory, which makes it well-suited for applications like neural networks. However, memristors suffer from conductance variation problem where their programmed conductance values deviate from the desired values. Such variations lead to computational errors that result in degraded inference accuracy in CIM-based neural networks. In this paper, we present a mapping-aware biased training methodology to mitigate the impact of conductance variation on CIM-based neural networks. We first determine which conductance states of the memristor are inherently more immune to variation. The neural network is then trained under the constraint that important weights can only take numeric values which directly get mapped to such favorable states. Simulation results show that our proposed mapping-aware biased training achieves up to 2.4× hardware accuracy compared to the conventional training.
{"title":"Mapping-aware Biased Training for Accurate Memristor-based Neural Networks","authors":"Sumit Diware, A. Gebregiorgis, R. Joshi, S. Hamdioui, R. Bishnoi","doi":"10.1109/AICAS57966.2023.10168661","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168661","url":null,"abstract":"Memristor-based computation-in-memory (CIM) can achieve high energy efficiency by processing the data within the memory, which makes it well-suited for applications like neural networks. However, memristors suffer from conductance variation problem where their programmed conductance values deviate from the desired values. Such variations lead to computational errors that result in degraded inference accuracy in CIM-based neural networks. In this paper, we present a mapping-aware biased training methodology to mitigate the impact of conductance variation on CIM-based neural networks. We first determine which conductance states of the memristor are inherently more immune to variation. The neural network is then trained under the constraint that important weights can only take numeric values which directly get mapped to such favorable states. Simulation results show that our proposed mapping-aware biased training achieves up to 2.4× hardware accuracy compared to the conventional training.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127357029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168629
M. Bettayeb, Eman Hassan, Baker Mohammad, H. Saleh
Brain-inspired computing methods have shown remarkable efficiency and robustness compared to deep neural networks (DNN). In particular, HyperDimensional Computing (HDC) and Vision Transformer (ViT) have demonstrated promising achievements in facilitating effective and reliable cognitive learning. This paper proposes SpatialHD, the first framework that combines spatial transformer networks (STN) and HDC. First, SpatialHD exploits the STN, which explicitly allows the spatial manipulation of data within the network. Then, it employs HDC to operate over STN output by mapping feature maps into high-dimensional space, learning abstracted information, and classifying data. In addition, the STN output is resized to generate a smaller input feature map. This further reduces computing complexity and memory storage compared to HDC alone. Finally, to test the model’s functionality, we applied spatial HD for image classification, utilizing the MNIST and Fashion-MNIST datasets, using only 25% of the dataset for training. Our results show that SpatialHD improves accuracy by ≈ 8% and enhances efficiency by approximately 2.5x compared to base-HDC.
{"title":"SpatialHD: Spatial Transformer Fused with Hyperdimensional Computing for AI Applications","authors":"M. Bettayeb, Eman Hassan, Baker Mohammad, H. Saleh","doi":"10.1109/AICAS57966.2023.10168629","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168629","url":null,"abstract":"Brain-inspired computing methods have shown remarkable efficiency and robustness compared to deep neural networks (DNN). In particular, HyperDimensional Computing (HDC) and Vision Transformer (ViT) have demonstrated promising achievements in facilitating effective and reliable cognitive learning. This paper proposes SpatialHD, the first framework that combines spatial transformer networks (STN) and HDC. First, SpatialHD exploits the STN, which explicitly allows the spatial manipulation of data within the network. Then, it employs HDC to operate over STN output by mapping feature maps into high-dimensional space, learning abstracted information, and classifying data. In addition, the STN output is resized to generate a smaller input feature map. This further reduces computing complexity and memory storage compared to HDC alone. Finally, to test the model’s functionality, we applied spatial HD for image classification, utilizing the MNIST and Fashion-MNIST datasets, using only 25% of the dataset for training. Our results show that SpatialHD improves accuracy by ≈ 8% and enhances efficiency by approximately 2.5x compared to base-HDC.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130591481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the rapid evolution of deep neural networks (DNNs), the massive computational burden brings about the difficulty of deploying DNN on edge devices. This situation gives rise to specialized hardware aiming at exploiting the sparsity of DNN parameters. Bit-serial architectures (BSAs) possess great performance potential by leveraging the abundant bit-wise sparsity. However, the distribution of effective bits of weights confines the performance of BSA designs. To improve the efficiency of BSA, we propose a weight-offset multiply-accumulation (MAC) scheme and an associated hardware design called Bit-offsetter in this paper. Weight-offsetting not only significantly boosts bit-wise sparsity but also brings out a more balanced distribution of essential bits. For Bit-offsetter, aside from leveraging the abundant bitwise sparsity induced by weight-offsetting, it’s also equipped with a load-balancing scheduler to reduce idle cycles and mitigate utilization degradation. According to our experiment on a series of DNN models, weight-offsetting can increase bit-wise sparsity for pre-trained weight up to 77.4% on average. The weight-offset MAC scheme associated with Bit-offsetter achieves 3.28×/2.94× speedup/energy efficiency over the baseline.
{"title":"Bit-Offsetter: A Bit-serial DNN Accelerator with Weight-offset MAC for Bit-wise Sparsity Exploitation","authors":"Siqi He, Hongyi Zhang, Mengjie Li, Haozhe Zhu, Chixiao Chen, Qi Liu, Xiaoyang Zeng","doi":"10.1109/AICAS57966.2023.10168618","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168618","url":null,"abstract":"With the rapid evolution of deep neural networks (DNNs), the massive computational burden brings about the difficulty of deploying DNN on edge devices. This situation gives rise to specialized hardware aiming at exploiting the sparsity of DNN parameters. Bit-serial architectures (BSAs) possess great performance potential by leveraging the abundant bit-wise sparsity. However, the distribution of effective bits of weights confines the performance of BSA designs. To improve the efficiency of BSA, we propose a weight-offset multiply-accumulation (MAC) scheme and an associated hardware design called Bit-offsetter in this paper. Weight-offsetting not only significantly boosts bit-wise sparsity but also brings out a more balanced distribution of essential bits. For Bit-offsetter, aside from leveraging the abundant bitwise sparsity induced by weight-offsetting, it’s also equipped with a load-balancing scheduler to reduce idle cycles and mitigate utilization degradation. According to our experiment on a series of DNN models, weight-offsetting can increase bit-wise sparsity for pre-trained weight up to 77.4% on average. The weight-offset MAC scheme associated with Bit-offsetter achieves 3.28×/2.94× speedup/energy efficiency over the baseline.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130644623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168609
Tong Shan, Jun Li, Xiao Hou, Peijin Huang, X. Guo
This live demonstration presents real-time character recognition using a portable system composed of an organic photodetectors-based imaging array and a smartphone. The high specific detectivity of the organic photodiode enables sensitive imaging with an ultra-low light intensity. Furthermore, a smartphone application using deep learning-based algorithm training has been applied for character recognition.
{"title":"Live Demonstration: Efficient Organic Photodetector based Active Matrix Imager for Real-time Optical Character Recognition","authors":"Tong Shan, Jun Li, Xiao Hou, Peijin Huang, X. Guo","doi":"10.1109/AICAS57966.2023.10168609","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168609","url":null,"abstract":"This live demonstration presents real-time character recognition using a portable system composed of an organic photodetectors-based imaging array and a smartphone. The high specific detectivity of the organic photodiode enables sensitive imaging with an ultra-low light intensity. Furthermore, a smartphone application using deep learning-based algorithm training has been applied for character recognition.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131410743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168580
T. Dalgaty, Maria Lepecq
Systolic arrays of processing elements are widely used to massively parallelise neural network layers. However, the execution of traditional convolutional and fully-connected layers on such hardware typically requires a non-negligible latency to distribute data over the array before each operation - data is not immediately in-place. This arises from the fundamental incompatibility between the physical spatial nature of a systolic array and the un-physical form of existing neural networks. We propose the systolic lateral mixer network (SLIM-Net) in an effort to reconcile this mismatch. The architecture of SLIM-Net maps directly onto the physical structure of a systolic array such that, after evaluating one layer, data immediately finds itself where it needs to be to begin the next. To evaluate the potential of SLIM-Net we compare it to a UNet model on a COCO segmentation task and find that, for models of equivalent size, SLIM-Net not only achieves a slightly better performance but requires almost an order of magnitude fewer MAC operations. Furthermore, we implement a lateral mixing layer on a systolic smart imager chip which executes seven times faster than similar convolutional layers on the same hardware and provides encouraging initial insights into the practicality of this new neuromorphic approach.
{"title":"SLIM-Net: Rethinking how neural networks use systolic arrays","authors":"T. Dalgaty, Maria Lepecq","doi":"10.1109/AICAS57966.2023.10168580","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168580","url":null,"abstract":"Systolic arrays of processing elements are widely used to massively parallelise neural network layers. However, the execution of traditional convolutional and fully-connected layers on such hardware typically requires a non-negligible latency to distribute data over the array before each operation - data is not immediately in-place. This arises from the fundamental incompatibility between the physical spatial nature of a systolic array and the un-physical form of existing neural networks. We propose the systolic lateral mixer network (SLIM-Net) in an effort to reconcile this mismatch. The architecture of SLIM-Net maps directly onto the physical structure of a systolic array such that, after evaluating one layer, data immediately finds itself where it needs to be to begin the next. To evaluate the potential of SLIM-Net we compare it to a UNet model on a COCO segmentation task and find that, for models of equivalent size, SLIM-Net not only achieves a slightly better performance but requires almost an order of magnitude fewer MAC operations. Furthermore, we implement a lateral mixing layer on a systolic smart imager chip which executes seven times faster than similar convolutional layers on the same hardware and provides encouraging initial insights into the practicality of this new neuromorphic approach.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"38 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132462052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168660
Qinyu Chen, Congyi Sun, Chang Gao, X. Fang, H. Luan
Fast video recognition is essential for real-time scenarios, e.g., autonomous driving. However, applying existing Deep Neural Networks (DNNs) to individual high-resolution images is expensive due to large model sizes. Spiking Neural Networks (SNNs) are developed as a promising alternative to DNNs due to their more realistic brain-inspired computing models. SNNs have sparse neuron firing over time, i.e., spatio-temporal sparsity; thus they are useful to enable energy-efficient computation. However, exploiting the spatio-temporal sparsity of SNNs in hardware leads to unpredictable and unbalanced workloads, degrading energy efficiency. In this work, we, therefore, propose an SNN accelerator called FrameFire for efficient video processing. We introduce a Keyframe-dominated Workload Balance Schedule (KWBS) method. It accelerates the image recognition network with sparse keyframes, then records and analyzes the current workload distribution on hardware to facilitate scheduling workloads in subsequent frames. FrameFire is implemented on a Xilinx XC7Z035 FPGA and verified by video segmentation tasks. The results show that the throughput is improved by 1.7× with the KWBS method. FrameFire achieved 1.04 KFPS throughput and 1.15 mJ/frame recognition energy.
{"title":"FrameFire: Enabling Efficient Spiking Neural Network Inference for Video Segmentation","authors":"Qinyu Chen, Congyi Sun, Chang Gao, X. Fang, H. Luan","doi":"10.1109/AICAS57966.2023.10168660","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168660","url":null,"abstract":"Fast video recognition is essential for real-time scenarios, e.g., autonomous driving. However, applying existing Deep Neural Networks (DNNs) to individual high-resolution images is expensive due to large model sizes. Spiking Neural Networks (SNNs) are developed as a promising alternative to DNNs due to their more realistic brain-inspired computing models. SNNs have sparse neuron firing over time, i.e., spatio-temporal sparsity; thus they are useful to enable energy-efficient computation. However, exploiting the spatio-temporal sparsity of SNNs in hardware leads to unpredictable and unbalanced workloads, degrading energy efficiency. In this work, we, therefore, propose an SNN accelerator called FrameFire for efficient video processing. We introduce a Keyframe-dominated Workload Balance Schedule (KWBS) method. It accelerates the image recognition network with sparse keyframes, then records and analyzes the current workload distribution on hardware to facilitate scheduling workloads in subsequent frames. FrameFire is implemented on a Xilinx XC7Z035 FPGA and verified by video segmentation tasks. The results show that the throughput is improved by 1.7× with the KWBS method. FrameFire achieved 1.04 KFPS throughput and 1.15 mJ/frame recognition energy.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132410618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168583
Yi Hsin Liao, Hsin Chen, K. Tang, Shu You Lin, Ding Xiao Wu, Yu-Chiao Chen, Hong Wen Luo
We propose a system to fast and easily detect lung cancer by breathing into the device, which is not invasive. Some particular substances only exist in lung cancer patients' breathing. Based on this, we use the CNN model to extract the feature in the gas exhaled by the testee. Then, the neural network will give out the prediction of lung cancer. To accelerate the computation of CNN, we design a hardware accelerator and implement it with FPGA (Field Programmable Gate Array). By comparing the performance, like power consumption and energy efficiency of different architectures, we could find the most appropriate architecture for us. Ultimately, we could reduce memory access by about 20% and reduce 12% of the energy consumption, achieving low power at edge devices. The performance of the CNN model is with a training accuracy 88.41%, a testing accuracy 85.29%, a false negative rate 5.8%, and a false positive rate 41.17%
{"title":"An Energy-Efficient and Reconfigurable CNN Accelerator Applied To Lung Cancer Detection","authors":"Yi Hsin Liao, Hsin Chen, K. Tang, Shu You Lin, Ding Xiao Wu, Yu-Chiao Chen, Hong Wen Luo","doi":"10.1109/AICAS57966.2023.10168583","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168583","url":null,"abstract":"We propose a system to fast and easily detect lung cancer by breathing into the device, which is not invasive. Some particular substances only exist in lung cancer patients' breathing. Based on this, we use the CNN model to extract the feature in the gas exhaled by the testee. Then, the neural network will give out the prediction of lung cancer. To accelerate the computation of CNN, we design a hardware accelerator and implement it with FPGA (Field Programmable Gate Array). By comparing the performance, like power consumption and energy efficiency of different architectures, we could find the most appropriate architecture for us. Ultimately, we could reduce memory access by about 20% and reduce 12% of the energy consumption, achieving low power at edge devices. The performance of the CNN model is with a training accuracy 88.41%, a testing accuracy 85.29%, a false negative rate 5.8%, and a false positive rate 41.17%","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132363663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}