Zhenyu Yin, Feiqing Zhang, Jiangbo Wang, Fulong Xu, Chao Fan
{"title":"Design and Implementation of a Fast Convolution Algorithm for Embedded Platform","authors":"Zhenyu Yin, Feiqing Zhang, Jiangbo Wang, Fulong Xu, Chao Fan","doi":"10.1109/ICTech55460.2022.00041","DOIUrl":null,"url":null,"abstract":"In recent years, deep learning has been gradually applied to the industry with great success. As the demand for the lightweight intelligent devices increases, the deployment of deep learning models on embedded platforms to meet users' needs for real-time performance has become a trend in the development of intelligence. However, due to the pursuit of higher accuracy, existing deep learning frameworks are becoming richer in functionality and more complex in computation. A large amount of memory requirements and computational power demands make it challenging to deploy neural network computing frameworks on embedded platforms with limited resources and computational power. The WPOC algorithm is proposed and integrated into the Darknet framework to address real-time image processing based on the Winograd algorithm. Tested on the ZYNQ-7010 platform was passed. The results show that the WPOC algorithm proposed in this paper can effectively speed up image recognition by about six times under the VGG-16 model while ensuring the same accuracy rate.","PeriodicalId":290836,"journal":{"name":"2022 11th International Conference of Information and Communication Technology (ICTech))","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 11th International Conference of Information and Communication Technology (ICTech))","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTech55460.2022.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In recent years, deep learning has been gradually applied to the industry with great success. As the demand for the lightweight intelligent devices increases, the deployment of deep learning models on embedded platforms to meet users' needs for real-time performance has become a trend in the development of intelligence. However, due to the pursuit of higher accuracy, existing deep learning frameworks are becoming richer in functionality and more complex in computation. A large amount of memory requirements and computational power demands make it challenging to deploy neural network computing frameworks on embedded platforms with limited resources and computational power. The WPOC algorithm is proposed and integrated into the Darknet framework to address real-time image processing based on the Winograd algorithm. Tested on the ZYNQ-7010 platform was passed. The results show that the WPOC algorithm proposed in this paper can effectively speed up image recognition by about six times under the VGG-16 model while ensuring the same accuracy rate.