Rudresh Pratap Singh , Shreyam Kumar , Jugal Gandhi , Diksha Shekhawat , M. Santosh , Jai Gopal Pandey
{"title":"一种基于时域二维OaA的卷积神经网络加速器","authors":"Rudresh Pratap Singh , Shreyam Kumar , Jugal Gandhi , Diksha Shekhawat , M. Santosh , Jai Gopal Pandey","doi":"10.1016/j.memori.2023.100041","DOIUrl":null,"url":null,"abstract":"<div><p>Convolutional neural networks (CNNs) are widely implemented in modern facial recognition systems for image recognition applications. Runtime speed is a critical parameter for real-time systems. Traditional FPGA-based accelerations require either large on-chip memory or high bandwidth and high memory access time that slow down the network. The proposed work uses an algorithm and its subsequent hardware design for a quick CNN computation using an overlap-and-add-based technique in the time domain. In the algorithm, the input images are broken into tiles that can be processed independently without computing overhead in the frequency domain. This also allows for efficient concurrency of the convolution process, resulting in higher throughput and lower power consumption. At the same time, we maintain low on-chip memory requirements necessary for faster and cheaper processor designs. We implemented CNN VGG-16 and AlexNet models with our design on Xilinx Virtex-7 and Zynq boards. The performance analysis of our design provides 48% better throughput than the state-of-the-art AlexNet and uses 68.85% lesser multipliers and other resources than the state-of-the-art VGG-16.</p></div>","PeriodicalId":100915,"journal":{"name":"Memories - Materials, Devices, Circuits and Systems","volume":"4 ","pages":"Article 100041"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A time domain 2D OaA-based convolutional neural networks accelerator\",\"authors\":\"Rudresh Pratap Singh , Shreyam Kumar , Jugal Gandhi , Diksha Shekhawat , M. Santosh , Jai Gopal Pandey\",\"doi\":\"10.1016/j.memori.2023.100041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Convolutional neural networks (CNNs) are widely implemented in modern facial recognition systems for image recognition applications. Runtime speed is a critical parameter for real-time systems. Traditional FPGA-based accelerations require either large on-chip memory or high bandwidth and high memory access time that slow down the network. The proposed work uses an algorithm and its subsequent hardware design for a quick CNN computation using an overlap-and-add-based technique in the time domain. In the algorithm, the input images are broken into tiles that can be processed independently without computing overhead in the frequency domain. This also allows for efficient concurrency of the convolution process, resulting in higher throughput and lower power consumption. At the same time, we maintain low on-chip memory requirements necessary for faster and cheaper processor designs. We implemented CNN VGG-16 and AlexNet models with our design on Xilinx Virtex-7 and Zynq boards. The performance analysis of our design provides 48% better throughput than the state-of-the-art AlexNet and uses 68.85% lesser multipliers and other resources than the state-of-the-art VGG-16.</p></div>\",\"PeriodicalId\":100915,\"journal\":{\"name\":\"Memories - Materials, Devices, Circuits and Systems\",\"volume\":\"4 \",\"pages\":\"Article 100041\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Memories - Materials, Devices, Circuits and Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S277306462300018X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Memories - Materials, Devices, Circuits and Systems","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S277306462300018X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A time domain 2D OaA-based convolutional neural networks accelerator
Convolutional neural networks (CNNs) are widely implemented in modern facial recognition systems for image recognition applications. Runtime speed is a critical parameter for real-time systems. Traditional FPGA-based accelerations require either large on-chip memory or high bandwidth and high memory access time that slow down the network. The proposed work uses an algorithm and its subsequent hardware design for a quick CNN computation using an overlap-and-add-based technique in the time domain. In the algorithm, the input images are broken into tiles that can be processed independently without computing overhead in the frequency domain. This also allows for efficient concurrency of the convolution process, resulting in higher throughput and lower power consumption. At the same time, we maintain low on-chip memory requirements necessary for faster and cheaper processor designs. We implemented CNN VGG-16 and AlexNet models with our design on Xilinx Virtex-7 and Zynq boards. The performance analysis of our design provides 48% better throughput than the state-of-the-art AlexNet and uses 68.85% lesser multipliers and other resources than the state-of-the-art VGG-16.