{"title":"深度全连接网络中具有激活平稳数据流的收缩阵列","authors":"Haochuan Wan, Chaolin Rao, Yueyang Zheng, Pingqiang Zhou, Xin Lou","doi":"10.1109/AICAS57966.2023.10168602","DOIUrl":null,"url":null,"abstract":"This paper presents an activation stationary (AS) dataflow suitable for networks with pure fully-connected (FC) layers. It is shown that the proposed AS dataflow can help to reduce the required memory size in hardware design and optimize energy efficiency by reducing data movement. Based on the AS dataflow, an output stationary (OS) systolic array is proposed to compute FC networks. To evaluate the proposed design, we further implement an accelerator for the FC-based implicit representation for MRI (IREM) algorithm. A proofof-concept demonstration system is developed based on field programmable gate array (FPGA). To evaluate the proposed design, We also map the IREM accelerator to 40nm CMOS technology and compare it with CPU, GPU-based and ASIC implementations.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Systolic Array with Activation Stationary Dataflow for Deep Fully-Connected Networks\",\"authors\":\"Haochuan Wan, Chaolin Rao, Yueyang Zheng, Pingqiang Zhou, Xin Lou\",\"doi\":\"10.1109/AICAS57966.2023.10168602\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents an activation stationary (AS) dataflow suitable for networks with pure fully-connected (FC) layers. It is shown that the proposed AS dataflow can help to reduce the required memory size in hardware design and optimize energy efficiency by reducing data movement. Based on the AS dataflow, an output stationary (OS) systolic array is proposed to compute FC networks. To evaluate the proposed design, we further implement an accelerator for the FC-based implicit representation for MRI (IREM) algorithm. A proofof-concept demonstration system is developed based on field programmable gate array (FPGA). To evaluate the proposed design, We also map the IREM accelerator to 40nm CMOS technology and compare it with CPU, GPU-based and ASIC implementations.\",\"PeriodicalId\":296649,\"journal\":{\"name\":\"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AICAS57966.2023.10168602\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICAS57966.2023.10168602","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Systolic Array with Activation Stationary Dataflow for Deep Fully-Connected Networks
This paper presents an activation stationary (AS) dataflow suitable for networks with pure fully-connected (FC) layers. It is shown that the proposed AS dataflow can help to reduce the required memory size in hardware design and optimize energy efficiency by reducing data movement. Based on the AS dataflow, an output stationary (OS) systolic array is proposed to compute FC networks. To evaluate the proposed design, we further implement an accelerator for the FC-based implicit representation for MRI (IREM) algorithm. A proofof-concept demonstration system is developed based on field programmable gate array (FPGA). To evaluate the proposed design, We also map the IREM accelerator to 40nm CMOS technology and compare it with CPU, GPU-based and ASIC implementations.