Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168596
Thanh-Dat Nguyen, Minh-Son Le, Thi-Nhan Pham, I. Chang
We introduce TRIO, a 10T SRAM cell for inmemory computing circuits in ternary neural networks (TNNs). TRIO's thin-cell type layout occupies only 0.492μm2 in a 28nm FD-SOI technology, which is smaller than some state-of-the-art ternary SRAM cells. Comparing TRIO to other works, we found that it consumes less analog multiplication power, indicating its potential for improving the area and power efficiency of TNN IMC circuits. Our optimized TNN IMC circuit using TRIO achieved high area and power efficiencies of 369.39 TOPS/mm2 and 333.8 TOPS/W in simulations.
{"title":"TRIO: a Novel 10T Ternary SRAM Cell for Area-Efficient In-memory Computing of Ternary Neural Networks","authors":"Thanh-Dat Nguyen, Minh-Son Le, Thi-Nhan Pham, I. Chang","doi":"10.1109/AICAS57966.2023.10168596","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168596","url":null,"abstract":"We introduce TRIO, a 10T SRAM cell for inmemory computing circuits in ternary neural networks (TNNs). TRIO's thin-cell type layout occupies only 0.492μm2 in a 28nm FD-SOI technology, which is smaller than some state-of-the-art ternary SRAM cells. Comparing TRIO to other works, we found that it consumes less analog multiplication power, indicating its potential for improving the area and power efficiency of TNN IMC circuits. Our optimized TNN IMC circuit using TRIO achieved high area and power efficiencies of 369.39 TOPS/mm2 and 333.8 TOPS/W in simulations.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132038767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-chiplet architecture can provide a high-performance solution for new tasks such as deep learning models. In order to fully utilize chiplets and accelerate the execution of deep learning models, we present a deep learning compilation optimization framework for chiplets, and propose a scheduling method based on data dependence. Experiments show that our method improves the compilation efficiency, and the performance of the scheduling scheme is at least 1-2 times higher than the traditional algorithms.
{"title":"Deep Learning Compiler Optimization on Multi-Chiplet Architecture","authors":"Huiqing Xu, Kuang Mao, Quihong Pan, Zhaorong Tang, Mengdi Wang, Ying Wang","doi":"10.1109/AICAS57966.2023.10168656","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168656","url":null,"abstract":"Multi-chiplet architecture can provide a high-performance solution for new tasks such as deep learning models. In order to fully utilize chiplets and accelerate the execution of deep learning models, we present a deep learning compilation optimization framework for chiplets, and propose a scheduling method based on data dependence. Experiments show that our method improves the compilation efficiency, and the performance of the scheduling scheme is at least 1-2 times higher than the traditional algorithms.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132168979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168619
Jinbo Chen, Hui Wu, Xing Liu, Razieh Eskandari, Fengshi Tian, Wenjun Zou, Chaoming Fang, Jie Yang, M. Sawan
The use of Brain-Machine Interfaces (BMIs) in neuroscience research and neural prosthetics has seen widespread application. With the technology trend shifting from wearable to implantable wireless BMIs featuring increasing channel counts, the volume of data generated requires impractically high bandwidth and transmission power for the implants. In this paper, we present NeuroBMI, a novel neuromorphic implantable wireless BMI that leverages a unified neuromorphic strategy for neural signal sampling, processing, and transmission. The proposed NeuroBMI and neuromorphic strategy reduces transmitted data rate and overall power consumption. NeuroBMI takes into account the high sparsity of neural signals by employing an integrateand-fire sampling based analog-to-spike converter (ASC), which generates digital spike trains based on triggered events and avoids unnecessary data sampling. Additionally, an event-driven noise-tolerant spike detector and event-driven spike transmitter are also proposed, to further reduce the energy consumption and transmitted data rate. Simulation results demonstrate that the proposed NeuroBMI achieves a data compression ratio of 520, with the proposed spike detector consuming only 0.48 µW.
{"title":"NeuroBMI: A New Neuromorphic Implantable Wireless Brain Machine Interface with A 0.48 µW Event-Driven Noise-Tolerant Spike Detector","authors":"Jinbo Chen, Hui Wu, Xing Liu, Razieh Eskandari, Fengshi Tian, Wenjun Zou, Chaoming Fang, Jie Yang, M. Sawan","doi":"10.1109/AICAS57966.2023.10168619","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168619","url":null,"abstract":"The use of Brain-Machine Interfaces (BMIs) in neuroscience research and neural prosthetics has seen widespread application. With the technology trend shifting from wearable to implantable wireless BMIs featuring increasing channel counts, the volume of data generated requires impractically high bandwidth and transmission power for the implants. In this paper, we present NeuroBMI, a novel neuromorphic implantable wireless BMI that leverages a unified neuromorphic strategy for neural signal sampling, processing, and transmission. The proposed NeuroBMI and neuromorphic strategy reduces transmitted data rate and overall power consumption. NeuroBMI takes into account the high sparsity of neural signals by employing an integrateand-fire sampling based analog-to-spike converter (ASC), which generates digital spike trains based on triggered events and avoids unnecessary data sampling. Additionally, an event-driven noise-tolerant spike detector and event-driven spike transmitter are also proposed, to further reduce the energy consumption and transmitted data rate. Simulation results demonstrate that the proposed NeuroBMI achieves a data compression ratio of 520, with the proposed spike detector consuming only 0.48 µW.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121899956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168643
Yue Zhang, Shuai Wang, Yi Kang
Inspired by the brain structure, Spiking Neural Networks (SNNs) are computing models communicating and calculating through spikes. SNNs that are well-trained demonstrate high sparsity in both weight and activation, distributed spatially and temporally. This sparsity presents both opportunities and challenges for high energy efficiency inference computing of SNNs when compared to conventional artificial neural networks (ANNs). Specifically, the high sparsity can significantly reduce inference delay and energy consumption. However, the temporal dimension greatly complicates the design of spiking accelerators. In this paper, we propose a unique solution for sparse spiking neural network acceleration. First, we adopt a temporal coding scheme called FS coding which differs from the rate coding used in traditional SNNs. Our design eliminates the need for multiplication due to the nature of FS coding. Second, we parallelize the computation required for the neuron at each time point to minimize the access of the weight data. Third, we fuse multiple spikes into one new spike to reduce inference delay and energy consumption. Our proposed architecture exhibits better performance and energy efficiency with less cost. Our experiments show that running MobileNet-V2, MF-DSNN achieves 6× to 22× energy efficiency improvements while having an accuracy degradation of less than 0.9% and using less silicon area on the ImageNet dataset compared to state-of-the-art artificial neural network accelerators.
{"title":"MF-DSNN:An Energy-efficient High-performance Multiplication-free Deep Spiking Neural Network Accelerator","authors":"Yue Zhang, Shuai Wang, Yi Kang","doi":"10.1109/AICAS57966.2023.10168643","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168643","url":null,"abstract":"Inspired by the brain structure, Spiking Neural Networks (SNNs) are computing models communicating and calculating through spikes. SNNs that are well-trained demonstrate high sparsity in both weight and activation, distributed spatially and temporally. This sparsity presents both opportunities and challenges for high energy efficiency inference computing of SNNs when compared to conventional artificial neural networks (ANNs). Specifically, the high sparsity can significantly reduce inference delay and energy consumption. However, the temporal dimension greatly complicates the design of spiking accelerators. In this paper, we propose a unique solution for sparse spiking neural network acceleration. First, we adopt a temporal coding scheme called FS coding which differs from the rate coding used in traditional SNNs. Our design eliminates the need for multiplication due to the nature of FS coding. Second, we parallelize the computation required for the neuron at each time point to minimize the access of the weight data. Third, we fuse multiple spikes into one new spike to reduce inference delay and energy consumption. Our proposed architecture exhibits better performance and energy efficiency with less cost. Our experiments show that running MobileNet-V2, MF-DSNN achieves 6× to 22× energy efficiency improvements while having an accuracy degradation of less than 0.9% and using less silicon area on the ImageNet dataset compared to state-of-the-art artificial neural network accelerators.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131575534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168566
Tim Hotfilter, Julian Höfer, Fabian Kreß, F. Kempf, Leonhard Kraft, T. Harbaum, J. Becker
A key challenge in computing convolutional neural networks (CNNs) besides the vast number of computations are the associated numerous energy-intensive transactions from main to local memory. In this paper, we present our methodical approach to maximize and prune coarse-grained regular blockwise sparsity in activation feature maps during CNN inference on dedicated dataflow architectures. Regular sparsity that fits the target accelerator, e.g., a systolic array or vector processor, allows simplified and resource inexpensive pruning compared to irregular sparsity, saving memory transactions and computations. Our threshold-based technique allows maximizing the number of regular sparse blocks in each layer. The wide range of threshold combinations that result from the close correlation between the number of sparse blocks and network accuracy can be explored automatically by our exploration tool Spex. To harness found sparse blocks for memory transaction and MAC operation reduction, we also propose Sparse-Blox, a low-overhead hardware extension for common neural network hardware accelerators. Sparse-Blox adds up to 5× less area than state-of-the-art accelerator extensions that operate on irregular sparsity. Evaluation of our blockwise pruning method with Spex on ResNet-50 and Yolo-v5s shows a reduction of up to 18.9% and 12.6% memory transfers, and 802 M (19.0%) and 1.5 G (24.3%) MAC operations with a 1% or 1 mAP accuracy drop, respectively.
除了大量的计算外,卷积神经网络(cnn)计算的一个关键挑战是从主存储器到本地存储器的大量能量密集型事务。在本文中,我们提出了一种有条理的方法来最大化和修剪在专用数据流架构的CNN推理过程中激活特征映射中的粗粒度规则块稀疏性。适合目标加速器的规则稀疏性,例如,收缩阵列或矢量处理器,与不规则稀疏性相比,允许简化和资源廉价的修剪,节省内存事务和计算。我们基于阈值的技术允许最大化每层中规则稀疏块的数量。由稀疏块数量和网络精度之间的密切相关而产生的大范围阈值组合可以通过我们的勘探工具Spex自动勘探。为了利用发现的稀疏块进行内存事务处理和MAC操作减少,我们还提出了sparse - blox,这是一种用于普通神经网络硬件加速器的低开销硬件扩展。Sparse-Blox的面积比最先进的不规则稀疏加速器扩展少5倍。在ResNet-50和ylo -v5s上使用Spex对我们的块修剪方法进行评估显示,内存传输减少了18.9%和12.6%,MAC操作减少了802 M(19.0%)和1.5 G (24.3%), mAP精度分别下降了1%或1。
{"title":"A Hardware-Centric Approach to Increase and Prune Regular Activation Sparsity in CNNs","authors":"Tim Hotfilter, Julian Höfer, Fabian Kreß, F. Kempf, Leonhard Kraft, T. Harbaum, J. Becker","doi":"10.1109/AICAS57966.2023.10168566","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168566","url":null,"abstract":"A key challenge in computing convolutional neural networks (CNNs) besides the vast number of computations are the associated numerous energy-intensive transactions from main to local memory. In this paper, we present our methodical approach to maximize and prune coarse-grained regular blockwise sparsity in activation feature maps during CNN inference on dedicated dataflow architectures. Regular sparsity that fits the target accelerator, e.g., a systolic array or vector processor, allows simplified and resource inexpensive pruning compared to irregular sparsity, saving memory transactions and computations. Our threshold-based technique allows maximizing the number of regular sparse blocks in each layer. The wide range of threshold combinations that result from the close correlation between the number of sparse blocks and network accuracy can be explored automatically by our exploration tool Spex. To harness found sparse blocks for memory transaction and MAC operation reduction, we also propose Sparse-Blox, a low-overhead hardware extension for common neural network hardware accelerators. Sparse-Blox adds up to 5× less area than state-of-the-art accelerator extensions that operate on irregular sparsity. Evaluation of our blockwise pruning method with Spex on ResNet-50 and Yolo-v5s shows a reduction of up to 18.9% and 12.6% memory transfers, and 802 M (19.0%) and 1.5 G (24.3%) MAC operations with a 1% or 1 mAP accuracy drop, respectively.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126446064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168589
Yannick Braatz, D. Rieber, T. Soliman, O. Bringmann
Underutilization of compute resources leads to decreased performance of single-core machine learning (ML) accelerators. Therefore, multi-core accelerators divide the computational load among multiple smaller groups of processing elements (PEs), keeping more resources active in parallel. However, while producing higher throughput, the accelerator behavior becomes more complex. Supplying multiple cores with data demands adjustments to the on-chip memory hierarchy and direct memory access controller (DMAC) programming. Correctly estimating these effects becomes crucial for optimizing multi-core accelerators, especially in design space exploration (DSE). This work introduces a novel semi-simulated prediction methodology for latency estimations in multi-core ML accelerators. Simulating only dynamic system interactions while determining the latency of isolated accelerator elements analytically makes the proposed methodology precise and fast. We evaluate our methodology on an in-house configurable accelerator with various computational cores on two widely used convolutional neural networks (CNNs). We can estimate the accelerator latency with an average error of 4.7%.
{"title":"Simulation-driven Latency Estimations for Multi-core Machine Learning Accelerators","authors":"Yannick Braatz, D. Rieber, T. Soliman, O. Bringmann","doi":"10.1109/AICAS57966.2023.10168589","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168589","url":null,"abstract":"Underutilization of compute resources leads to decreased performance of single-core machine learning (ML) accelerators. Therefore, multi-core accelerators divide the computational load among multiple smaller groups of processing elements (PEs), keeping more resources active in parallel. However, while producing higher throughput, the accelerator behavior becomes more complex. Supplying multiple cores with data demands adjustments to the on-chip memory hierarchy and direct memory access controller (DMAC) programming. Correctly estimating these effects becomes crucial for optimizing multi-core accelerators, especially in design space exploration (DSE). This work introduces a novel semi-simulated prediction methodology for latency estimations in multi-core ML accelerators. Simulating only dynamic system interactions while determining the latency of isolated accelerator elements analytically makes the proposed methodology precise and fast. We evaluate our methodology on an in-house configurable accelerator with various computational cores on two widely used convolutional neural networks (CNNs). We can estimate the accelerator latency with an average error of 4.7%.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129726303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168611
Ivan Diez-de-los-Rios, J. Ephraim, G. Palazzolo, T. Serrano-Gotarredona, G. Panuccio, B. Linares-Barranco
In this paper we present a memristor-inspired computational method for obtaining a type of running "spectrogram" or "fingerprint" of epileptiform activity generated by rodent hippocampal spheroids. It can be used to compute on the fly and with low computational cost an alert-level signal for epileptiform events onset. Here, we describe the computational method behind this "fingerprint" technique and illustrate it using epileptiform events recorded from hippocampal spheroids using a microelectrode array system.
{"title":"A Memristor-Inspired Computation for Epileptiform Signals in Spheroids","authors":"Ivan Diez-de-los-Rios, J. Ephraim, G. Palazzolo, T. Serrano-Gotarredona, G. Panuccio, B. Linares-Barranco","doi":"10.1109/AICAS57966.2023.10168611","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168611","url":null,"abstract":"In this paper we present a memristor-inspired computational method for obtaining a type of running \"spectrogram\" or \"fingerprint\" of epileptiform activity generated by rodent hippocampal spheroids. It can be used to compute on the fly and with low computational cost an alert-level signal for epileptiform events onset. Here, we describe the computational method behind this \"fingerprint\" technique and illustrate it using epileptiform events recorded from hippocampal spheroids using a microelectrode array system.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133529207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a live demonstration at AICAS’2023 for commercial SRAM compute-in-memory (CIM) accelerators. This live demonstration includes both visual and aural signal processing and classification performed by SRAM-based CIM engines. The visual part is a low-power face recognition platform, which can display and detect the audience’s faces in real-time. The aural part is a key word spotting engine, with which the audience can interact and control the device for designated tasks (such as "volume up" and "volume down"). This live demonstration is interactive and can bring a live feeling of energy efficiency improvement using the commercial CIM accelerators.
{"title":"Live Demonstration: SRAM Compute-In-Memory Based Visual & Aural Recognition System","authors":"Anjunyi Fan, Bo Hu, Zhonghua Jin, Haiyue Han, Yaojun Zhang, Yue Yang, Yuchao Yang, Bonan Yan, Ru Huang","doi":"10.1109/AICAS57966.2023.10168569","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168569","url":null,"abstract":"We propose a live demonstration at AICAS’2023 for commercial SRAM compute-in-memory (CIM) accelerators. This live demonstration includes both visual and aural signal processing and classification performed by SRAM-based CIM engines. The visual part is a low-power face recognition platform, which can display and detect the audience’s faces in real-time. The aural part is a key word spotting engine, with which the audience can interact and control the device for designated tasks (such as \"volume up\" and \"volume down\"). This live demonstration is interactive and can bring a live feeling of energy efficiency improvement using the commercial CIM accelerators.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130174180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168564
Runxi Wang, Xinfei Guo
AI running on the edge requires silicon that can meet demanding performance requirements while meeting the aggressive power and area budget. Frequently updated AI algorithms also demand matched processors to well employ their advantages. Compute-in-memory (CIM) architecture appears as a promising energy-efficient solution that completes the intensive computations in-situ where the data are stored. While prior works have shown great progress in designing SRAM-based CIM macros with fixed functionality that were tailored for specific AI applications, the flexibility reserved for wider usage scenarios is missing. In this paper, we propose a novel SRAM-based CIM macro that can be hierarchically configured to support various boolean operations, arithmetic operations, and macro operations. In addition, we demonstrate with an example that the proposed design can be expanded to support more essential edge computations with minimal overhead. Compared with the existing reconfigurable SRAM-based CIM macros, this work achieves a greater balance of reconfigurability vs. hardware cost by implementing flexibility at various design hierarchies.
{"title":"A Hierarchically Reconfigurable SRAM-Based Compute-in-Memory Macro for Edge Computing","authors":"Runxi Wang, Xinfei Guo","doi":"10.1109/AICAS57966.2023.10168564","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168564","url":null,"abstract":"AI running on the edge requires silicon that can meet demanding performance requirements while meeting the aggressive power and area budget. Frequently updated AI algorithms also demand matched processors to well employ their advantages. Compute-in-memory (CIM) architecture appears as a promising energy-efficient solution that completes the intensive computations in-situ where the data are stored. While prior works have shown great progress in designing SRAM-based CIM macros with fixed functionality that were tailored for specific AI applications, the flexibility reserved for wider usage scenarios is missing. In this paper, we propose a novel SRAM-based CIM macro that can be hierarchically configured to support various boolean operations, arithmetic operations, and macro operations. In addition, we demonstrate with an example that the proposed design can be expanded to support more essential edge computations with minimal overhead. Compared with the existing reconfigurable SRAM-based CIM macros, this work achieves a greater balance of reconfigurability vs. hardware cost by implementing flexibility at various design hierarchies.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114621182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-11DOI: 10.1109/AICAS57966.2023.10168616
Feng Bao, Zehua Dong, Jie Yu, Songping Mai
To achieve high-speed and low-resource consumption 3D measurement, we propose a parallel and full-pipeline FPGA architecture for the phase measuring profilometry algorithm. The proposed system uses four-step phase-shifting and gray code decoding to generate accurate 3D point clouds. Experimental results show that the proposed architecture can process 12 frames of images with a resolution of 720 × 540 in just 12.2 ms, which is 110 times faster than the same implementation in software, and has the smallest resource consumption compared with other similar FPGA systems. This makes the proposed system very suitable for high-speed embedded 3D shape measurement applications.
{"title":"FPGA-Based High-Speed and Resource-Efficient 3D Reconstruction for Structured Light System","authors":"Feng Bao, Zehua Dong, Jie Yu, Songping Mai","doi":"10.1109/AICAS57966.2023.10168616","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168616","url":null,"abstract":"To achieve high-speed and low-resource consumption 3D measurement, we propose a parallel and full-pipeline FPGA architecture for the phase measuring profilometry algorithm. The proposed system uses four-step phase-shifting and gray code decoding to generate accurate 3D point clouds. Experimental results show that the proposed architecture can process 12 frames of images with a resolution of 720 × 540 in just 12.2 ms, which is 110 times faster than the same implementation in software, and has the smallest resource consumption compared with other similar FPGA systems. This makes the proposed system very suitable for high-speed embedded 3D shape measurement applications.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132935430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}