{"title":"基于Winograd的新型卷积神经网络加速器","authors":"Zhijian Lin, Meng Zhang, Dongpeng Weng, Fei Liu","doi":"10.1109/iccss55260.2022.9802420","DOIUrl":null,"url":null,"abstract":"In recent years, the current trend of Convolutional Neural Networks (CNNs) is toward lower computational cost to achieve lightweight. In lightweight convolutional neural networks, the depthwise separable convolution (DSC) is becoming the mainstream method. But in DSC, the pointwise convolution (PWC) with $1\\times 1$ filters still has abundant parameters and computation. In this paper, an more efficient convolution algorithm is proposed to replace PWC, named kernel shared group convolution (KSGC). KSGC is used to combine channel information, which can be seen as the same convolution kernel sliding on the channel. In addition, Winograd algorithm is used to mitigate the number of multiplications required by KSGC in this paper. A CNN accelerator using a novel processing element (PE) performs 1-D Winograd in KSGC was implemented on a Ultra96-V2 field-programmable gate array (FPGA).At 200MHz clock frequency, the accelerator achieved computational performance of 52. 7GOPS and performance-power ratio of 10.42GOPS/W.","PeriodicalId":254992,"journal":{"name":"2022 5th International Conference on Circuits, Systems and Simulation (ICCSS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Efficient Accelerator with Winograd for Novel Convolutional Neural Networks\",\"authors\":\"Zhijian Lin, Meng Zhang, Dongpeng Weng, Fei Liu\",\"doi\":\"10.1109/iccss55260.2022.9802420\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the current trend of Convolutional Neural Networks (CNNs) is toward lower computational cost to achieve lightweight. In lightweight convolutional neural networks, the depthwise separable convolution (DSC) is becoming the mainstream method. But in DSC, the pointwise convolution (PWC) with $1\\\\times 1$ filters still has abundant parameters and computation. In this paper, an more efficient convolution algorithm is proposed to replace PWC, named kernel shared group convolution (KSGC). KSGC is used to combine channel information, which can be seen as the same convolution kernel sliding on the channel. In addition, Winograd algorithm is used to mitigate the number of multiplications required by KSGC in this paper. A CNN accelerator using a novel processing element (PE) performs 1-D Winograd in KSGC was implemented on a Ultra96-V2 field-programmable gate array (FPGA).At 200MHz clock frequency, the accelerator achieved computational performance of 52. 7GOPS and performance-power ratio of 10.42GOPS/W.\",\"PeriodicalId\":254992,\"journal\":{\"name\":\"2022 5th International Conference on Circuits, Systems and Simulation (ICCSS)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 5th International Conference on Circuits, Systems and Simulation (ICCSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iccss55260.2022.9802420\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 5th International Conference on Circuits, Systems and Simulation (ICCSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccss55260.2022.9802420","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
近年来,卷积神经网络(cnn)的发展趋势是降低计算成本,实现轻量化。在轻量级卷积神经网络中,深度可分离卷积(DSC)正逐渐成为主流方法。但在DSC中,1 × 1滤波器的点向卷积(PWC)仍然具有丰富的参数和计算量。本文提出了一种更有效的卷积算法来代替PWC,称为核共享群卷积(kernel shared group convolution, KSGC)。KSGC用于合并信道信息,这可以看作是在信道上滑动的相同卷积核。此外,本文还使用Winograd算法来减少KSGC所需的乘法次数。在Ultra96-V2现场可编程门阵列(FPGA)上实现了一种使用新型处理元件(PE)在KSGC中执行一维Winograd的CNN加速器。在200MHz时钟频率下,加速器的计算性能达到52。7GOPS,性能功率比10.42GOPS/W。
An Efficient Accelerator with Winograd for Novel Convolutional Neural Networks
In recent years, the current trend of Convolutional Neural Networks (CNNs) is toward lower computational cost to achieve lightweight. In lightweight convolutional neural networks, the depthwise separable convolution (DSC) is becoming the mainstream method. But in DSC, the pointwise convolution (PWC) with $1\times 1$ filters still has abundant parameters and computation. In this paper, an more efficient convolution algorithm is proposed to replace PWC, named kernel shared group convolution (KSGC). KSGC is used to combine channel information, which can be seen as the same convolution kernel sliding on the channel. In addition, Winograd algorithm is used to mitigate the number of multiplications required by KSGC in this paper. A CNN accelerator using a novel processing element (PE) performs 1-D Winograd in KSGC was implemented on a Ultra96-V2 field-programmable gate array (FPGA).At 200MHz clock frequency, the accelerator achieved computational performance of 52. 7GOPS and performance-power ratio of 10.42GOPS/W.