卷积神经网络硬件实现的比较分析

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI) Pub Date : 2022-08-22 DOI:10.1109/SBCCI55532.2022.9893234

Gabriel H. Eisenkraemer, L. Oliveira, E. Carara

{"title":"卷积神经网络硬件实现的比较分析","authors":"Gabriel H. Eisenkraemer, L. Oliveira, E. Carara","doi":"10.1109/SBCCI55532.2022.9893234","DOIUrl":null,"url":null,"abstract":"Artificial Neural Networks (ANNs) have become the most popular machine learning technique for data processing, performing central functions in a wide variety of applications. In many cases, these models are used within constrained scenarios, in which a local execution of the algorithm is necessary to avoid latency and safety issues of remote computing (e.g, autonomous vehicles, edge devices in IoT networks). Even so, the known computational complexity of these models is still a challenge in such contexts, as implementation costs and performance requirements are difficult to balance. In these scenarios, pa-rameter quantization techniques are essential to simplifying the operations and memory footprint to make the hardware implementation more viable. In this paper, a case study is devised in which a convolutional neural network (CNN) architecture is fully implemented in hardware with three different optimization strategies, having parameters mapped to low bit-width fixed point integers with a power-of-two quantization scheme. Both ASIC and FPGA implementation flows are followed, allowing for an in-depth analysis of each circuit version. The obtained results show that the adopted quantization process enables optimizations on the implemented circuit, reducing about 50% of the circuitry area and 87.5% of the memory requirement. At the same time, the application performance was kept at the same level.","PeriodicalId":231587,"journal":{"name":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","volume":"39 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative Analysis of Hardware Implementations of a Convolutional Neural Network\",\"authors\":\"Gabriel H. Eisenkraemer, L. Oliveira, E. Carara\",\"doi\":\"10.1109/SBCCI55532.2022.9893234\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Artificial Neural Networks (ANNs) have become the most popular machine learning technique for data processing, performing central functions in a wide variety of applications. In many cases, these models are used within constrained scenarios, in which a local execution of the algorithm is necessary to avoid latency and safety issues of remote computing (e.g, autonomous vehicles, edge devices in IoT networks). Even so, the known computational complexity of these models is still a challenge in such contexts, as implementation costs and performance requirements are difficult to balance. In these scenarios, pa-rameter quantization techniques are essential to simplifying the operations and memory footprint to make the hardware implementation more viable. In this paper, a case study is devised in which a convolutional neural network (CNN) architecture is fully implemented in hardware with three different optimization strategies, having parameters mapped to low bit-width fixed point integers with a power-of-two quantization scheme. Both ASIC and FPGA implementation flows are followed, allowing for an in-depth analysis of each circuit version. The obtained results show that the adopted quantization process enables optimizations on the implemented circuit, reducing about 50% of the circuitry area and 87.5% of the memory requirement. At the same time, the application performance was kept at the same level.\",\"PeriodicalId\":231587,\"journal\":{\"name\":\"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)\",\"volume\":\"39 4\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SBCCI55532.2022.9893234\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBCCI55532.2022.9893234","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

人工神经网络(ann)已经成为最流行的数据处理机器学习技术，在各种应用中发挥核心作用。在许多情况下，这些模型在受限的场景中使用，在这些场景中，算法的本地执行是必要的，以避免远程计算的延迟和安全问题(例如，自动驾驶汽车，物联网网络中的边缘设备)。即便如此，在这种情况下，这些模型已知的计算复杂性仍然是一个挑战，因为实现成本和性能需求很难平衡。在这些场景中，参数量化技术对于简化操作和内存占用至关重要，从而使硬件实现更加可行。在本文中，设计了一个案例研究，其中卷积神经网络(CNN)架构在硬件上完全实现，采用三种不同的优化策略，将参数映射到低位宽不动点整数，并采用2次幂量化方案。遵循ASIC和FPGA实现流程，允许对每个电路版本进行深入分析。结果表明，所采用的量化过程可以优化所实现的电路，减少约50%的电路面积和87.5%的内存需求。同时，应用程序性能保持在同一水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Comparative Analysis of Hardware Implementations of a Convolutional Neural Network

Artificial Neural Networks (ANNs) have become the most popular machine learning technique for data processing, performing central functions in a wide variety of applications. In many cases, these models are used within constrained scenarios, in which a local execution of the algorithm is necessary to avoid latency and safety issues of remote computing (e.g, autonomous vehicles, edge devices in IoT networks). Even so, the known computational complexity of these models is still a challenge in such contexts, as implementation costs and performance requirements are difficult to balance. In these scenarios, pa-rameter quantization techniques are essential to simplifying the operations and memory footprint to make the hardware implementation more viable. In this paper, a case study is devised in which a convolutional neural network (CNN) architecture is fully implemented in hardware with three different optimization strategies, having parameters mapped to low bit-width fixed point integers with a power-of-two quantization scheme. Both ASIC and FPGA implementation flows are followed, allowing for an in-depth analysis of each circuit version. The obtained results show that the adopted quantization process enables optimizations on the implemented circuit, reducing about 50% of the circuitry area and 87.5% of the memory requirement. At the same time, the application performance was kept at the same level.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)

自引率

0.00%

发文量

期刊最新文献

Transistor Reordering for Electrical Improvement in CMOS Complex Gates CSIP: A Compact Scrypt IP design with single PBKDF2 core for Blockchain mining A High-level Model to Leverage NoC-based Many-core Research Time Assisted SAR ADC with Bit-guess and Digital Error Correction A Time-Efficient Defect Simulation Framework for Analog and Mixed Signal (AMS) Circuits