用于AdderNet高效加速的群矢量绝对值减法单元阵列

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI:10.1109/AICAS57966.2023.10168637

Jiahao Chen, Wanbo Hu, Wenling Ma, Zhilin Zhang, Mingqiang Huang

{"title":"用于AdderNet高效加速的群矢量绝对值减法单元阵列","authors":"Jiahao Chen, Wanbo Hu, Wenling Ma, Zhilin Zhang, Mingqiang Huang","doi":"10.1109/AICAS57966.2023.10168637","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNN) have been widely used for boosting the performance of Artificial Intelligence (AI) tasks. However, the CNN models are usually computational intensive. Recently, the novel absolute-value-subtraction (ABS) operation based CNN, namely the AdderNet is proposed to reduce the computation complexity and energy burden. But the specific hardware design has rarely been explored. In this work, we propose an energy-efficient AdderNet accelerator to address such issue. At the hardware architecture level, we develop a flexible and group vectored systolic array to balance the circuit area, power, and speed. Thanks to the low delay of ABS operation, the systolic array can reach extremely high frequency up to 2GHz. Meanwhile the power- and area- efficiency exhibits about 3× improvement compared with its CNN counterpart. At the processing element level, we propose new ABS cell based on algorithm optimization, which shows about 10% higher performance than the naive design. Finally, the accelerator is practically deployed on FPGA platform to accelerate the AdderNet ResNet-18 network as a case study. The peak throughput is 424.2 GOP/s, which is much higher than previous works.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"166 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Group Vectored Absolute-Value-Subtraction Cell Array for the Efficient Acceleration of AdderNet\",\"authors\":\"Jiahao Chen, Wanbo Hu, Wenling Ma, Zhilin Zhang, Mingqiang Huang\",\"doi\":\"10.1109/AICAS57966.2023.10168637\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional neural networks (CNN) have been widely used for boosting the performance of Artificial Intelligence (AI) tasks. However, the CNN models are usually computational intensive. Recently, the novel absolute-value-subtraction (ABS) operation based CNN, namely the AdderNet is proposed to reduce the computation complexity and energy burden. But the specific hardware design has rarely been explored. In this work, we propose an energy-efficient AdderNet accelerator to address such issue. At the hardware architecture level, we develop a flexible and group vectored systolic array to balance the circuit area, power, and speed. Thanks to the low delay of ABS operation, the systolic array can reach extremely high frequency up to 2GHz. Meanwhile the power- and area- efficiency exhibits about 3× improvement compared with its CNN counterpart. At the processing element level, we propose new ABS cell based on algorithm optimization, which shows about 10% higher performance than the naive design. Finally, the accelerator is practically deployed on FPGA platform to accelerate the AdderNet ResNet-18 network as a case study. The peak throughput is 424.2 GOP/s, which is much higher than previous works.\",\"PeriodicalId\":296649,\"journal\":{\"name\":\"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)\",\"volume\":\"166 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AICAS57966.2023.10168637\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICAS57966.2023.10168637","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

卷积神经网络(CNN)已被广泛用于提高人工智能(AI)任务的性能。然而，CNN模型通常是计算密集型的。最近，为了降低计算复杂度和能量负担，提出了一种新的基于CNN的绝对值减法(ABS)运算，即AdderNet。但是具体的硬件设计很少被探索。在这项工作中，我们提出了一个节能的AdderNet加速器来解决这个问题。在硬件架构层面，我们开发了一种灵活的群矢量收缩阵列，以平衡电路面积，功率和速度。由于ABS操作的低延迟，收缩阵列可以达到极高的频率，最高可达2GHz。与此同时，功率效率和面积效率比CNN提高了约3倍。在处理单元层面，我们提出了基于算法优化的新型ABS单元，其性能比原始设计提高了约10%。最后，将该加速器实际部署在FPGA平台上，以加速AdderNet ResNet-18网络为例进行研究。峰值吞吐量为424.2 GOP/s，大大高于以往的工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Group Vectored Absolute-Value-Subtraction Cell Array for the Efficient Acceleration of AdderNet

Convolutional neural networks (CNN) have been widely used for boosting the performance of Artificial Intelligence (AI) tasks. However, the CNN models are usually computational intensive. Recently, the novel absolute-value-subtraction (ABS) operation based CNN, namely the AdderNet is proposed to reduce the computation complexity and energy burden. But the specific hardware design has rarely been explored. In this work, we propose an energy-efficient AdderNet accelerator to address such issue. At the hardware architecture level, we develop a flexible and group vectored systolic array to balance the circuit area, power, and speed. Thanks to the low delay of ABS operation, the systolic array can reach extremely high frequency up to 2GHz. Meanwhile the power- and area- efficiency exhibits about 3× improvement compared with its CNN counterpart. At the processing element level, we propose new ABS cell based on algorithm optimization, which shows about 10% higher performance than the naive design. Finally, the accelerator is practically deployed on FPGA platform to accelerate the AdderNet ResNet-18 network as a case study. The peak throughput is 424.2 GOP/s, which is much higher than previous works.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

自引率

0.00%

发文量