Huafeng Ye, Huipeng Deng, Jian Wang, Mingyu Wang, Zhiyi Yu
{"title":"3D- nwa:用于3D cnn的嵌套winograd加速器","authors":"Huafeng Ye, Huipeng Deng, Jian Wang, Mingyu Wang, Zhiyi Yu","doi":"10.1109/ICTA56932.2022.9963033","DOIUrl":null,"url":null,"abstract":"3D Convolutional neural networks (3D CNNs) perform better in some scenarios, such as video understanding and 3D medical image diagnosis. With the increase in the dimension and size of the convolution kernel, CNN's computational complexity and implementation difficulty increase severely. Winograd transformation can significantly reduce the number of multiplications in convolution operations. However, large convolution filters will bring numerical instability. In this article, we presented a novel method called 3D nested Winograd algorithm to address the problem. Compared with the state-of-art OLA-Winograd algorithm, the proposed algorithm reduces the multiplications by 1.72 to 5.83× for computing 5 × 5 × 5 to 9 × 9 × 9 convolutions. Finally, we demonstrate the efficiency of 3D-NWA on the FPGA platform (Xilinx VCU118) and achieve highest DSP efficiency up to 4.67× compared with the state-of-art accelerators.","PeriodicalId":325602,"journal":{"name":"2022 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"3D-NWA: A Nested-Winograd Accelerator for 3D CNNs\",\"authors\":\"Huafeng Ye, Huipeng Deng, Jian Wang, Mingyu Wang, Zhiyi Yu\",\"doi\":\"10.1109/ICTA56932.2022.9963033\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"3D Convolutional neural networks (3D CNNs) perform better in some scenarios, such as video understanding and 3D medical image diagnosis. With the increase in the dimension and size of the convolution kernel, CNN's computational complexity and implementation difficulty increase severely. Winograd transformation can significantly reduce the number of multiplications in convolution operations. However, large convolution filters will bring numerical instability. In this article, we presented a novel method called 3D nested Winograd algorithm to address the problem. Compared with the state-of-art OLA-Winograd algorithm, the proposed algorithm reduces the multiplications by 1.72 to 5.83× for computing 5 × 5 × 5 to 9 × 9 × 9 convolutions. Finally, we demonstrate the efficiency of 3D-NWA on the FPGA platform (Xilinx VCU118) and achieve highest DSP efficiency up to 4.67× compared with the state-of-art accelerators.\",\"PeriodicalId\":325602,\"journal\":{\"name\":\"2022 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTA56932.2022.9963033\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTA56932.2022.9963033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
3D Convolutional neural networks (3D CNNs) perform better in some scenarios, such as video understanding and 3D medical image diagnosis. With the increase in the dimension and size of the convolution kernel, CNN's computational complexity and implementation difficulty increase severely. Winograd transformation can significantly reduce the number of multiplications in convolution operations. However, large convolution filters will bring numerical instability. In this article, we presented a novel method called 3D nested Winograd algorithm to address the problem. Compared with the state-of-art OLA-Winograd algorithm, the proposed algorithm reduces the multiplications by 1.72 to 5.83× for computing 5 × 5 × 5 to 9 × 9 × 9 convolutions. Finally, we demonstrate the efficiency of 3D-NWA on the FPGA platform (Xilinx VCU118) and achieve highest DSP efficiency up to 4.67× compared with the state-of-art accelerators.