2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)最新文献

英文中文

TRIO: a Novel 10T Ternary SRAM Cell for Area-Efficient In-memory Computing of Ternary Neural Networks 一种用于三元神经网络区域高效内存计算的新型10T三元SRAM单元

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168596

Thanh-Dat Nguyen, Minh-Son Le, Thi-Nhan Pham, I. Chang

We introduce TRIO, a 10T SRAM cell for inmemory computing circuits in ternary neural networks (TNNs). TRIO's thin-cell type layout occupies only 0.492μm2 in a 28nm FD-SOI technology, which is smaller than some state-of-the-art ternary SRAM cells. Comparing TRIO to other works, we found that it consumes less analog multiplication power, indicating its potential for improving the area and power efficiency of TNN IMC circuits. Our optimized TNN IMC circuit using TRIO achieved high area and power efficiencies of 369.39 TOPS/mm2 and 333.8 TOPS/W in simulations.

我们介绍了TRIO，一种用于三元神经网络(TNNs)内存计算电路的10T SRAM单元。TRIO的薄电池型布局在28nm FD-SOI技术中仅占0.492μm2，比一些最先进的三元SRAM电池要小。将TRIO与其他工作进行比较，我们发现它消耗更少的模拟乘法功率，这表明它具有提高TNN IMC电路的面积和功率效率的潜力。我们使用TRIO优化的TNN IMC电路在模拟中获得了369.39 TOPS/mm2和333.8 TOPS/W的高面积和功率效率。

引用次数: 0

Deep Learning Compiler Optimization on Multi-Chiplet Architecture 基于多芯片架构的深度学习编译器优化

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168656

Huiqing Xu, Kuang Mao, Quihong Pan, Zhaorong Tang, Mengdi Wang, Ying Wang

Multi-chiplet architecture can provide a high-performance solution for new tasks such as deep learning models. In order to fully utilize chiplets and accelerate the execution of deep learning models, we present a deep learning compilation optimization framework for chiplets, and propose a scheduling method based on data dependence. Experiments show that our method improves the compilation efficiency, and the performance of the scheduling scheme is at least 1-2 times higher than the traditional algorithms.

多芯片架构可以为深度学习模型等新任务提供高性能的解决方案。为了充分利用小芯片，加快深度学习模型的执行速度，提出了一种小芯片深度学习编译优化框架，并提出了一种基于数据依赖的调度方法。实验表明，该方法提高了编译效率，调度方案的性能比传统算法至少提高1-2倍。

引用次数: 0

NeuroBMI: A New Neuromorphic Implantable Wireless Brain Machine Interface with A 0.48 µW Event-Driven Noise-Tolerant Spike Detector neurorobmi:一种新的神经形态植入式无线脑机接口，带有0.48 μ W事件驱动的容噪峰值检测器

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168619

Jinbo Chen, Hui Wu, Xing Liu, Razieh Eskandari, Fengshi Tian, Wenjun Zou, Chaoming Fang, Jie Yang, M. Sawan

The use of Brain-Machine Interfaces (BMIs) in neuroscience research and neural prosthetics has seen widespread application. With the technology trend shifting from wearable to implantable wireless BMIs featuring increasing channel counts, the volume of data generated requires impractically high bandwidth and transmission power for the implants. In this paper, we present NeuroBMI, a novel neuromorphic implantable wireless BMI that leverages a unified neuromorphic strategy for neural signal sampling, processing, and transmission. The proposed NeuroBMI and neuromorphic strategy reduces transmitted data rate and overall power consumption. NeuroBMI takes into account the high sparsity of neural signals by employing an integrateand-fire sampling based analog-to-spike converter (ASC), which generates digital spike trains based on triggered events and avoids unnecessary data sampling. Additionally, an event-driven noise-tolerant spike detector and event-driven spike transmitter are also proposed, to further reduce the energy consumption and transmitted data rate. Simulation results demonstrate that the proposed NeuroBMI achieves a data compression ratio of 520, with the proposed spike detector consuming only 0.48 µW.

脑机接口(bmi)在神经科学研究和神经修复中得到了广泛的应用。随着技术趋势从可穿戴向可植入的无线bmi转变，其信道数量不断增加，产生的数据量要求植入物具有不切实际的高带宽和传输功率。在本文中，我们提出了NeuroBMI，一种新的神经形态植入式无线BMI，利用统一的神经形态策略进行神经信号采样，处理和传输。所提出的neurorobmi和neuromorphic策略降低了传输数据速率和总体功耗。NeuroBMI考虑到神经信号的高稀疏性，采用基于集成采样的模拟-尖峰转换器(ASC)，该转换器根据触发事件生成数字尖峰序列，避免了不必要的数据采样。此外，还提出了一种事件驱动的容噪尖峰检测器和事件驱动的尖峰发射机，以进一步降低能耗和传输数据速率。仿真结果表明，所提出的NeuroBMI实现了520的数据压缩比，所提出的尖峰检测器仅消耗0.48µW。

{"title":"NeuroBMI: A New Neuromorphic Implantable Wireless Brain Machine Interface with A 0.48 µW Event-Driven Noise-Tolerant Spike Detector","authors":"Jinbo Chen, Hui Wu, Xing Liu, Razieh Eskandari, Fengshi Tian, Wenjun Zou, Chaoming Fang, Jie Yang, M. Sawan","doi":"10.1109/AICAS57966.2023.10168619","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168619","url":null,"abstract":"The use of Brain-Machine Interfaces (BMIs) in neuroscience research and neural prosthetics has seen widespread application. With the technology trend shifting from wearable to implantable wireless BMIs featuring increasing channel counts, the volume of data generated requires impractically high bandwidth and transmission power for the implants. In this paper, we present NeuroBMI, a novel neuromorphic implantable wireless BMI that leverages a unified neuromorphic strategy for neural signal sampling, processing, and transmission. The proposed NeuroBMI and neuromorphic strategy reduces transmitted data rate and overall power consumption. NeuroBMI takes into account the high sparsity of neural signals by employing an integrateand-fire sampling based analog-to-spike converter (ASC), which generates digital spike trains based on triggered events and avoids unnecessary data sampling. Additionally, an event-driven noise-tolerant spike detector and event-driven spike transmitter are also proposed, to further reduce the energy consumption and transmitted data rate. Simulation results demonstrate that the proposed NeuroBMI achieves a data compression ratio of 520, with the proposed spike detector consuming only 0.48 µW.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121899956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MF-DSNN:An Energy-efficient High-performance Multiplication-free Deep Spiking Neural Network Accelerator MF-DSNN:一种高效节能的无乘法深度峰值神经网络加速器

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168643

Yue Zhang, Shuai Wang, Yi Kang

Inspired by the brain structure, Spiking Neural Networks (SNNs) are computing models communicating and calculating through spikes. SNNs that are well-trained demonstrate high sparsity in both weight and activation, distributed spatially and temporally. This sparsity presents both opportunities and challenges for high energy efficiency inference computing of SNNs when compared to conventional artificial neural networks (ANNs). Specifically, the high sparsity can significantly reduce inference delay and energy consumption. However, the temporal dimension greatly complicates the design of spiking accelerators. In this paper, we propose a unique solution for sparse spiking neural network acceleration. First, we adopt a temporal coding scheme called FS coding which differs from the rate coding used in traditional SNNs. Our design eliminates the need for multiplication due to the nature of FS coding. Second, we parallelize the computation required for the neuron at each time point to minimize the access of the weight data. Third, we fuse multiple spikes into one new spike to reduce inference delay and energy consumption. Our proposed architecture exhibits better performance and energy efficiency with less cost. Our experiments show that running MobileNet-V2, MF-DSNN achieves 6× to 22× energy efficiency improvements while having an accuracy degradation of less than 0.9% and using less silicon area on the ImageNet dataset compared to state-of-the-art artificial neural network accelerators.

受大脑结构的启发，尖峰神经网络(snn)是一种通过尖峰进行交流和计算的计算模型。训练良好的snn在空间和时间上都具有高的权重和激活稀疏性。与传统的人工神经网络(ann)相比，这种稀疏性为snn的高能效推理计算提供了机遇和挑战。具体来说，高稀疏性可以显著降低推理延迟和能耗。然而，时间维度极大地复杂化了脉冲加速器的设计。本文提出了稀疏尖峰神经网络加速的唯一解。首先，我们采用了一种称为FS编码的时间编码方案，它与传统snn中使用的速率编码不同。由于FS编码的特性，我们的设计消除了乘法的需要。其次，我们在每个时间点并行化神经元所需的计算，以最小化对权重数据的访问。第三，我们将多个尖峰融合成一个新的尖峰，以减少推理延迟和能量消耗。我们提出的架构以更低的成本表现出更好的性能和能源效率。我们的实验表明，与最先进的人工神经网络加速器相比，运行MobileNet-V2, MF-DSNN实现了6到22倍的能效提高，同时精度下降不到0.9%，并且在ImageNet数据集上使用更少的硅面积。

{"title":"MF-DSNN:An Energy-efficient High-performance Multiplication-free Deep Spiking Neural Network Accelerator","authors":"Yue Zhang, Shuai Wang, Yi Kang","doi":"10.1109/AICAS57966.2023.10168643","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168643","url":null,"abstract":"Inspired by the brain structure, Spiking Neural Networks (SNNs) are computing models communicating and calculating through spikes. SNNs that are well-trained demonstrate high sparsity in both weight and activation, distributed spatially and temporally. This sparsity presents both opportunities and challenges for high energy efficiency inference computing of SNNs when compared to conventional artificial neural networks (ANNs). Specifically, the high sparsity can significantly reduce inference delay and energy consumption. However, the temporal dimension greatly complicates the design of spiking accelerators. In this paper, we propose a unique solution for sparse spiking neural network acceleration. First, we adopt a temporal coding scheme called FS coding which differs from the rate coding used in traditional SNNs. Our design eliminates the need for multiplication due to the nature of FS coding. Second, we parallelize the computation required for the neuron at each time point to minimize the access of the weight data. Third, we fuse multiple spikes into one new spike to reduce inference delay and energy consumption. Our proposed architecture exhibits better performance and energy efficiency with less cost. Our experiments show that running MobileNet-V2, MF-DSNN achieves 6× to 22× energy efficiency improvements while having an accuracy degradation of less than 0.9% and using less silicon area on the ImageNet dataset compared to state-of-the-art artificial neural network accelerators.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131575534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Hardware-Centric Approach to Increase and Prune Regular Activation Sparsity in CNNs 一种以硬件为中心的cnn正则激活稀疏度增加和减少方法

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168566

Tim Hotfilter, Julian Höfer, Fabian Kreß, F. Kempf, Leonhard Kraft, T. Harbaum, J. Becker

A key challenge in computing convolutional neural networks (CNNs) besides the vast number of computations are the associated numerous energy-intensive transactions from main to local memory. In this paper, we present our methodical approach to maximize and prune coarse-grained regular blockwise sparsity in activation feature maps during CNN inference on dedicated dataflow architectures. Regular sparsity that fits the target accelerator, e.g., a systolic array or vector processor, allows simplified and resource inexpensive pruning compared to irregular sparsity, saving memory transactions and computations. Our threshold-based technique allows maximizing the number of regular sparse blocks in each layer. The wide range of threshold combinations that result from the close correlation between the number of sparse blocks and network accuracy can be explored automatically by our exploration tool Spex. To harness found sparse blocks for memory transaction and MAC operation reduction, we also propose Sparse-Blox, a low-overhead hardware extension for common neural network hardware accelerators. Sparse-Blox adds up to 5× less area than state-of-the-art accelerator extensions that operate on irregular sparsity. Evaluation of our blockwise pruning method with Spex on ResNet-50 and Yolo-v5s shows a reduction of up to 18.9% and 12.6% memory transfers, and 802 M (19.0%) and 1.5 G (24.3%) MAC operations with a 1% or 1 mAP accuracy drop, respectively.

除了大量的计算外，卷积神经网络(cnn)计算的一个关键挑战是从主存储器到本地存储器的大量能量密集型事务。在本文中，我们提出了一种有条理的方法来最大化和修剪在专用数据流架构的CNN推理过程中激活特征映射中的粗粒度规则块稀疏性。适合目标加速器的规则稀疏性，例如，收缩阵列或矢量处理器，与不规则稀疏性相比，允许简化和资源廉价的修剪，节省内存事务和计算。我们基于阈值的技术允许最大化每层中规则稀疏块的数量。由稀疏块数量和网络精度之间的密切相关而产生的大范围阈值组合可以通过我们的勘探工具Spex自动勘探。为了利用发现的稀疏块进行内存事务处理和MAC操作减少，我们还提出了sparse - blox，这是一种用于普通神经网络硬件加速器的低开销硬件扩展。Sparse-Blox的面积比最先进的不规则稀疏加速器扩展少5倍。在ResNet-50和ylo -v5s上使用Spex对我们的块修剪方法进行评估显示，内存传输减少了18.9%和12.6%，MAC操作减少了802 M(19.0%)和1.5 G (24.3%)， mAP精度分别下降了1%或1。

{"title":"A Hardware-Centric Approach to Increase and Prune Regular Activation Sparsity in CNNs","authors":"Tim Hotfilter, Julian Höfer, Fabian Kreß, F. Kempf, Leonhard Kraft, T. Harbaum, J. Becker","doi":"10.1109/AICAS57966.2023.10168566","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168566","url":null,"abstract":"A key challenge in computing convolutional neural networks (CNNs) besides the vast number of computations are the associated numerous energy-intensive transactions from main to local memory. In this paper, we present our methodical approach to maximize and prune coarse-grained regular blockwise sparsity in activation feature maps during CNN inference on dedicated dataflow architectures. Regular sparsity that fits the target accelerator, e.g., a systolic array or vector processor, allows simplified and resource inexpensive pruning compared to irregular sparsity, saving memory transactions and computations. Our threshold-based technique allows maximizing the number of regular sparse blocks in each layer. The wide range of threshold combinations that result from the close correlation between the number of sparse blocks and network accuracy can be explored automatically by our exploration tool Spex. To harness found sparse blocks for memory transaction and MAC operation reduction, we also propose Sparse-Blox, a low-overhead hardware extension for common neural network hardware accelerators. Sparse-Blox adds up to 5× less area than state-of-the-art accelerator extensions that operate on irregular sparsity. Evaluation of our blockwise pruning method with Spex on ResNet-50 and Yolo-v5s shows a reduction of up to 18.9% and 12.6% memory transfers, and 802 M (19.0%) and 1.5 G (24.3%) MAC operations with a 1% or 1 mAP accuracy drop, respectively.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126446064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Simulation-driven Latency Estimations for Multi-core Machine Learning Accelerators

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168589

Yannick Braatz, D. Rieber, T. Soliman, O. Bringmann

Underutilization of compute resources leads to decreased performance of single-core machine learning (ML) accelerators. Therefore, multi-core accelerators divide the computational load among multiple smaller groups of processing elements (PEs), keeping more resources active in parallel. However, while producing higher throughput, the accelerator behavior becomes more complex. Supplying multiple cores with data demands adjustments to the on-chip memory hierarchy and direct memory access controller (DMAC) programming. Correctly estimating these effects becomes crucial for optimizing multi-core accelerators, especially in design space exploration (DSE). This work introduces a novel semi-simulated prediction methodology for latency estimations in multi-core ML accelerators. Simulating only dynamic system interactions while determining the latency of isolated accelerator elements analytically makes the proposed methodology precise and fast. We evaluate our methodology on an in-house configurable accelerator with various computational cores on two widely used convolutional neural networks (CNNs). We can estimate the accelerator latency with an average error of 4.7%.

计算资源利用率不足导致单核机器学习(ML)加速器性能下降。因此，多核加速器将计算负载分配给多个较小的处理元素组(pe)，使更多的资源处于并行活动状态。然而，在产生更高吞吐量的同时，加速器的行为变得更加复杂。为多个核心提供数据需要对片上存储器层次结构和直接存储器访问控制器(DMAC)编程进行调整。正确估计这些影响对于优化多核加速器至关重要，特别是在设计空间探索(DSE)中。在分析确定孤立加速器元件延迟的同时，只模拟动态系统相互作用，使得该方法精确、快速。我们在两个广泛使用的卷积神经网络(cnn)上的内部可配置加速器上评估了我们的方法。我们可以估计加速器延迟的平均误差为4.7%。

{"title":"Simulation-driven Latency Estimations for Multi-core Machine Learning Accelerators","authors":"Yannick Braatz, D. Rieber, T. Soliman, O. Bringmann","doi":"10.1109/AICAS57966.2023.10168589","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168589","url":null,"abstract":"Underutilization of compute resources leads to decreased performance of single-core machine learning (ML) accelerators. Therefore, multi-core accelerators divide the computational load among multiple smaller groups of processing elements (PEs), keeping more resources active in parallel. However, while producing higher throughput, the accelerator behavior becomes more complex. Supplying multiple cores with data demands adjustments to the on-chip memory hierarchy and direct memory access controller (DMAC) programming. Correctly estimating these effects becomes crucial for optimizing multi-core accelerators, especially in design space exploration (DSE). This work introduces a novel semi-simulated prediction methodology for latency estimations in multi-core ML accelerators. Simulating only dynamic system interactions while determining the latency of isolated accelerator elements analytically makes the proposed methodology precise and fast. We evaluate our methodology on an in-house configurable accelerator with various computational cores on two widely used convolutional neural networks (CNNs). We can estimate the accelerator latency with an average error of 4.7%.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129726303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Memristor-Inspired Computation for Epileptiform Signals in Spheroids 椭球中癫痫样信号的忆阻器启发计算

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168611

Ivan Diez-de-los-Rios, J. Ephraim, G. Palazzolo, T. Serrano-Gotarredona, G. Panuccio, B. Linares-Barranco

In this paper we present a memristor-inspired computational method for obtaining a type of running "spectrogram" or "fingerprint" of epileptiform activity generated by rodent hippocampal spheroids. It can be used to compute on the fly and with low computational cost an alert-level signal for epileptiform events onset. Here, we describe the computational method behind this "fingerprint" technique and illustrate it using epileptiform events recorded from hippocampal spheroids using a microelectrode array system.

在本文中，我们提出了一种忆阻器启发的计算方法，用于获得啮齿动物海马球体产生的癫痫样活动的一种运行“谱图”或“指纹”。它可用于动态计算并且计算成本低的癫痫发作事件的警报级信号。在这里，我们描述了这种“指纹”技术背后的计算方法，并使用微电极阵列系统从海马球体记录的癫痫样事件来说明它。

引用次数: 0

Live Demonstration: SRAM Compute-In-Memory Based Visual & Aural Recognition System 现场演示:基于内存计算的SRAM视觉和听觉识别系统

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168569

Anjunyi Fan, Bo Hu, Zhonghua Jin, Haiyue Han, Yaojun Zhang, Yue Yang, Yuchao Yang, Bonan Yan, Ru Huang

We propose a live demonstration at AICAS’2023 for commercial SRAM compute-in-memory (CIM) accelerators. This live demonstration includes both visual and aural signal processing and classification performed by SRAM-based CIM engines. The visual part is a low-power face recognition platform, which can display and detect the audience’s faces in real-time. The aural part is a key word spotting engine, with which the audience can interact and control the device for designated tasks (such as "volume up" and "volume down"). This live demonstration is interactive and can bring a live feeling of energy efficiency improvement using the commercial CIM accelerators.

我们建议在AICAS ' 2023上对商用SRAM内存中计算(CIM)加速器进行现场演示。这个现场演示包括由基于sram的CIM引擎执行的视觉和听觉信号处理和分类。视觉部分是一个低功耗的人脸识别平台，可以实时显示和检测观众的面部。听觉部分是一个关键词识别引擎，观众可以通过它来交互和控制设备来完成指定的任务(比如“调高音量”和“调低音量”)。这个现场演示是交互式的，可以带来使用商用CIM加速器提高能源效率的现场感觉。

引用次数: 0

A Hierarchically Reconfigurable SRAM-Based Compute-in-Memory Macro for Edge Computing 一种基于分层可重构sram的边缘计算宏

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168564

Runxi Wang, Xinfei Guo

AI running on the edge requires silicon that can meet demanding performance requirements while meeting the aggressive power and area budget. Frequently updated AI algorithms also demand matched processors to well employ their advantages. Compute-in-memory (CIM) architecture appears as a promising energy-efficient solution that completes the intensive computations in-situ where the data are stored. While prior works have shown great progress in designing SRAM-based CIM macros with fixed functionality that were tailored for specific AI applications, the flexibility reserved for wider usage scenarios is missing. In this paper, we propose a novel SRAM-based CIM macro that can be hierarchically configured to support various boolean operations, arithmetic operations, and macro operations. In addition, we demonstrate with an example that the proposed design can be expanded to support more essential edge computations with minimal overhead. Compared with the existing reconfigurable SRAM-based CIM macros, this work achieves a greater balance of reconfigurability vs. hardware cost by implementing flexibility at various design hierarchies.

在边缘上运行的人工智能需要能够满足苛刻的性能要求，同时满足激进的功率和面积预算的硅。频繁更新的人工智能算法也要求匹配的处理器充分利用它们的优势。内存中计算(CIM)体系结构似乎是一种很有前途的节能解决方案，它可以在存储数据的地方完成密集的计算。虽然先前的工作在设计基于sram的CIM宏方面取得了很大的进展，这些宏具有为特定的AI应用程序量身定制的固定功能，但缺少为更广泛的使用场景保留的灵活性。在本文中，我们提出了一种新的基于sram的CIM宏，它可以分层配置以支持各种布尔运算、算术运算和宏操作。此外，我们还通过一个示例证明，可以扩展所建议的设计，以最小的开销支持更基本的边缘计算。与现有的基于sram的可重构CIM宏相比，这项工作通过在各种设计层次上实现灵活性，实现了可重构性与硬件成本之间的更大平衡。

{"title":"A Hierarchically Reconfigurable SRAM-Based Compute-in-Memory Macro for Edge Computing","authors":"Runxi Wang, Xinfei Guo","doi":"10.1109/AICAS57966.2023.10168564","DOIUrl":"https://doi.org/10.1109/AICAS57966.2023.10168564","url":null,"abstract":"AI running on the edge requires silicon that can meet demanding performance requirements while meeting the aggressive power and area budget. Frequently updated AI algorithms also demand matched processors to well employ their advantages. Compute-in-memory (CIM) architecture appears as a promising energy-efficient solution that completes the intensive computations in-situ where the data are stored. While prior works have shown great progress in designing SRAM-based CIM macros with fixed functionality that were tailored for specific AI applications, the flexibility reserved for wider usage scenarios is missing. In this paper, we propose a novel SRAM-based CIM macro that can be hierarchically configured to support various boolean operations, arithmetic operations, and macro operations. In addition, we demonstrate with an example that the proposed design can be expanded to support more essential edge computations with minimal overhead. Compared with the existing reconfigurable SRAM-based CIM macros, this work achieves a greater balance of reconfigurability vs. hardware cost by implementing flexibility at various design hierarchies.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114621182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FPGA-Based High-Speed and Resource-Efficient 3D Reconstruction for Structured Light System 基于fpga的结构光系统高速资源高效三维重建

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

Pub Date : 2023-06-11 DOI: 10.1109/AICAS57966.2023.10168616

Feng Bao, Zehua Dong, Jie Yu, Songping Mai

To achieve high-speed and low-resource consumption 3D measurement, we propose a parallel and full-pipeline FPGA architecture for the phase measuring profilometry algorithm. The proposed system uses four-step phase-shifting and gray code decoding to generate accurate 3D point clouds. Experimental results show that the proposed architecture can process 12 frames of images with a resolution of 720 × 540 in just 12.2 ms, which is 110 times faster than the same implementation in software, and has the smallest resource consumption compared with other similar FPGA systems. This makes the proposed system very suitable for high-speed embedded 3D shape measurement applications.

为了实现高速、低资源消耗的三维测量，我们提出了一种并行、全流水线的FPGA结构用于相位测量轮廓测量算法。该系统采用四步相移和灰度码解码技术生成精确的三维点云。实验结果表明，该架构可以在12.2 ms内处理12帧分辨率为720 × 540的图像，比软件实现速度快110倍，并且与其他同类FPGA系统相比具有最小的资源消耗。这使得所提出的系统非常适合高速嵌入式三维形状测量应用。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀