A Memory-Efficient Hardware Architecture for Deformable Convolutional Networks

Yue Yu, Jiapeng Luo, W. Mao, Zhongfeng Wang
{"title":"A Memory-Efficient Hardware Architecture for Deformable Convolutional Networks","authors":"Yue Yu, Jiapeng Luo, W. Mao, Zhongfeng Wang","doi":"10.1109/SiPS52927.2021.00033","DOIUrl":null,"url":null,"abstract":"In recent years, deformable convolutional networks are widely adopted in object detection tasks and have achieved outstanding performance. Compared with conventional convolution, the deformable convolution has an irregular receptive field to adapt to objects with different sizes and shapes. However, the irregularity of the receptive field causes inefficient access to memory and increases the complexity of control logic. Toward hardware-friendly implementation, prior works change the characteristics of deformable convolution by restricting the receptive field, leading to accuracy degradation. In this paper, we develop a dedicated Sampling Core to sample and rearrange the input pixels, enabling the convolution array to access the inputs regularly. In addition, a memory-efficient dataflow is introduced to match the processing speed of the Sampling Core and convolutional array, which improves hardware utilization and reduces access to off-chip memory. Based on these optimizations, we propose a novel hardware architecture for the deformable convolution network, which is the first work to accelerate the original deformable convolution network. With the design of the memory-efficient architecture, the access to the off-chip memory is reduced significantly. We implement it on Xilinx Virtex-7 FPGA, and experiments show that the energy efficiency reaches 50.29 GOPS/W, which is 2.5 times higher compared with executing the same network on GPU.","PeriodicalId":103894,"journal":{"name":"2021 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Workshop on Signal Processing Systems (SiPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SiPS52927.2021.00033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

In recent years, deformable convolutional networks are widely adopted in object detection tasks and have achieved outstanding performance. Compared with conventional convolution, the deformable convolution has an irregular receptive field to adapt to objects with different sizes and shapes. However, the irregularity of the receptive field causes inefficient access to memory and increases the complexity of control logic. Toward hardware-friendly implementation, prior works change the characteristics of deformable convolution by restricting the receptive field, leading to accuracy degradation. In this paper, we develop a dedicated Sampling Core to sample and rearrange the input pixels, enabling the convolution array to access the inputs regularly. In addition, a memory-efficient dataflow is introduced to match the processing speed of the Sampling Core and convolutional array, which improves hardware utilization and reduces access to off-chip memory. Based on these optimizations, we propose a novel hardware architecture for the deformable convolution network, which is the first work to accelerate the original deformable convolution network. With the design of the memory-efficient architecture, the access to the off-chip memory is reduced significantly. We implement it on Xilinx Virtex-7 FPGA, and experiments show that the energy efficiency reaches 50.29 GOPS/W, which is 2.5 times higher compared with executing the same network on GPU.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种可变形卷积网络的内存高效硬件架构
近年来,可变形卷积网络被广泛应用于目标检测任务中,并取得了优异的性能。与常规卷积相比,可变形卷积具有不规则的接受野,可以适应不同大小和形状的物体。然而,接收野的不规则性导致对记忆的低效访问,并增加了控制逻辑的复杂性。对于硬件友好的实现,先前的工作通过限制接受场来改变可变形卷积的特性,导致精度下降。在本文中,我们开发了一个专用的采样核心来采样和重新排列输入像素,使卷积阵列能够定期访问输入。此外,引入了内存高效数据流来匹配采样核和卷积阵列的处理速度,从而提高了硬件利用率并减少了对片外存储器的访问。在此基础上,我们提出了一种新的可变形卷积网络硬件架构,这是对原有可变形卷积网络进行加速的首次工作。通过内存高效架构的设计,大大减少了对片外存储器的访问。我们在Xilinx Virtex-7 FPGA上实现了该算法,实验表明,该算法的能量效率达到50.29 GOPS/W,比在GPU上执行相同的网络提高了2.5倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Reconfigurable Neural Synaptic Plasticity-Based Stochastic Deep Neural Network Computing Efficient Mind-wandering Detection System with GSR Signals on MM-SART Database Time sliding window for the detection of CCSK frames Implementation of a Two-Dimensional FFT/IFFT Processor for Real-Time High-Resolution Synthetic Aperture Radar Imaging TernGEMM: GEneral Matrix Multiply Library with Ternary Weights for Fast DNN Inference
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1