A time domain 2D OaA-based convolutional neural networks accelerator

Rudresh Pratap Singh , Shreyam Kumar , Jugal Gandhi , Diksha Shekhawat , M. Santosh , Jai Gopal Pandey
{"title":"A time domain 2D OaA-based convolutional neural networks accelerator","authors":"Rudresh Pratap Singh ,&nbsp;Shreyam Kumar ,&nbsp;Jugal Gandhi ,&nbsp;Diksha Shekhawat ,&nbsp;M. Santosh ,&nbsp;Jai Gopal Pandey","doi":"10.1016/j.memori.2023.100041","DOIUrl":null,"url":null,"abstract":"<div><p>Convolutional neural networks (CNNs) are widely implemented in modern facial recognition systems for image recognition applications. Runtime speed is a critical parameter for real-time systems. Traditional FPGA-based accelerations require either large on-chip memory or high bandwidth and high memory access time that slow down the network. The proposed work uses an algorithm and its subsequent hardware design for a quick CNN computation using an overlap-and-add-based technique in the time domain. In the algorithm, the input images are broken into tiles that can be processed independently without computing overhead in the frequency domain. This also allows for efficient concurrency of the convolution process, resulting in higher throughput and lower power consumption. At the same time, we maintain low on-chip memory requirements necessary for faster and cheaper processor designs. We implemented CNN VGG-16 and AlexNet models with our design on Xilinx Virtex-7 and Zynq boards. The performance analysis of our design provides 48% better throughput than the state-of-the-art AlexNet and uses 68.85% lesser multipliers and other resources than the state-of-the-art VGG-16.</p></div>","PeriodicalId":100915,"journal":{"name":"Memories - Materials, Devices, Circuits and Systems","volume":"4 ","pages":"Article 100041"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Memories - Materials, Devices, Circuits and Systems","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S277306462300018X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Convolutional neural networks (CNNs) are widely implemented in modern facial recognition systems for image recognition applications. Runtime speed is a critical parameter for real-time systems. Traditional FPGA-based accelerations require either large on-chip memory or high bandwidth and high memory access time that slow down the network. The proposed work uses an algorithm and its subsequent hardware design for a quick CNN computation using an overlap-and-add-based technique in the time domain. In the algorithm, the input images are broken into tiles that can be processed independently without computing overhead in the frequency domain. This also allows for efficient concurrency of the convolution process, resulting in higher throughput and lower power consumption. At the same time, we maintain low on-chip memory requirements necessary for faster and cheaper processor designs. We implemented CNN VGG-16 and AlexNet models with our design on Xilinx Virtex-7 and Zynq boards. The performance analysis of our design provides 48% better throughput than the state-of-the-art AlexNet and uses 68.85% lesser multipliers and other resources than the state-of-the-art VGG-16.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种基于时域二维OaA的卷积神经网络加速器
卷积神经网络(CNNs)在现代人脸识别系统中被广泛应用于图像识别应用。运行时速度是实时系统的一个关键参数。传统的基于FPGA的加速需要大的片上存储器或高带宽和高存储器访问时间,这会减慢网络速度。所提出的工作使用一种算法及其后续硬件设计,在时域中使用基于重叠和加法的技术进行快速CNN计算。在该算法中,输入图像被分解成可以独立处理的瓦片,而无需频域中的计算开销。这也允许卷积过程的高效并发,从而获得更高的吞吐量和更低的功耗。同时,我们保持较低的片上存储器需求,这是更快、更便宜的处理器设计所必需的。我们在Xilinx Virtex-7和Zynq板上实现了CNN VGG-16和AlexNet模型。我们设计的性能分析提供了比最先进的AlexNet高48%的吞吐量,并使用了比最新的VGG-16少68.85%的乘法器和其他资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Development of an analog topology for a multi-layer neuronal network A graphene-based toxic detection approach Optimization of deep learning algorithms for large digital data processing using evolutionary neural networks The application of organic materials used in IC advanced packaging:A review Design and evaluation of clock-gating-based approximate multiplier for error-tolerant applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1