基于 109-GOPs/W FPGA 的视觉变换器加速器,采用重量环数据流,具有数据重用和资源节约的特点

IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-08-06 DOI:10.1109/TCSVT.2024.3439600
Yueqi Zhang;Lichen Feng;Hongwei Shan;Zhangming Zhu
{"title":"基于 109-GOPs/W FPGA 的视觉变换器加速器,采用重量环数据流,具有数据重用和资源节约的特点","authors":"Yueqi Zhang;Lichen Feng;Hongwei Shan;Zhangming Zhu","doi":"10.1109/TCSVT.2024.3439600","DOIUrl":null,"url":null,"abstract":"The Vision Transformer (ViT) models have demonstrated excellent performance in computer vision tasks, but a large amount of computation and memory access for massive matrix multiplications lead to degraded hardware performance compared to convolutional neural network (CNN). In this paper, we propose a ViT accelerator with a novel “Weight-Loop” dataflow and its computing unit, for efficient matrix multiplication computation. By data partitioning and rearrangement, the number of memory accesses and the number of registers are greatly reduced, and the adder trees are eliminated. A computation pipeline with the proposed dataflow scheduling method is constructed to maintain a high utilization rate through zero bubble switching. Moreover, a novel accurate dual INT8 multiply-accumulate (DI8MAC) method for DSP optimization is introduced to eliminate the additional correction circuits by weight encoding. Verified in the Xilinx XCZU9EG FPGA, the proposed ViT accelerator achieves the lowest inference latencies of 3.91 ms and 13.98 ms for ViT-S and ViT-B, respectively. The throughput of the accelerator can reach up to 2330.2 GOPs with an energy efficiency of 109 GOPs/W, showing a significant improvement compared to the state-of-the-art works.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"34 12","pages":"13596-13610"},"PeriodicalIF":11.1000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A 109-GOPs/W FPGA-Based Vision Transformer Accelerator With Weight-Loop Dataflow Featuring Data Reusing and Resource Saving\",\"authors\":\"Yueqi Zhang;Lichen Feng;Hongwei Shan;Zhangming Zhu\",\"doi\":\"10.1109/TCSVT.2024.3439600\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Vision Transformer (ViT) models have demonstrated excellent performance in computer vision tasks, but a large amount of computation and memory access for massive matrix multiplications lead to degraded hardware performance compared to convolutional neural network (CNN). In this paper, we propose a ViT accelerator with a novel “Weight-Loop” dataflow and its computing unit, for efficient matrix multiplication computation. By data partitioning and rearrangement, the number of memory accesses and the number of registers are greatly reduced, and the adder trees are eliminated. A computation pipeline with the proposed dataflow scheduling method is constructed to maintain a high utilization rate through zero bubble switching. Moreover, a novel accurate dual INT8 multiply-accumulate (DI8MAC) method for DSP optimization is introduced to eliminate the additional correction circuits by weight encoding. Verified in the Xilinx XCZU9EG FPGA, the proposed ViT accelerator achieves the lowest inference latencies of 3.91 ms and 13.98 ms for ViT-S and ViT-B, respectively. The throughput of the accelerator can reach up to 2330.2 GOPs with an energy efficiency of 109 GOPs/W, showing a significant improvement compared to the state-of-the-art works.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"34 12\",\"pages\":\"13596-13610\"},\"PeriodicalIF\":11.1000,\"publicationDate\":\"2024-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10623817/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10623817/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

视觉变换(Vision Transformer, ViT)模型在计算机视觉任务中表现出优异的性能,但与卷积神经网络(CNN)相比,大量的计算和大规模矩阵乘法的内存访问导致硬件性能下降。在本文中,我们提出了一种ViT加速器,具有新颖的“权重环”数据流及其计算单元,用于高效的矩阵乘法计算。通过对数据进行分区和重排,大大减少了存储器访问次数和寄存器数量,消除了加法树。利用所提出的数据流调度方法构建计算流水线,通过零泡切换保持较高的利用率。此外,提出了一种新的精确双INT8乘累加(DI8MAC)优化方法,通过权值编码消除了额外的校正电路。在Xilinx XCZU9EG FPGA上进行验证,所提出的ViT加速器在ViT- s和ViT- b上分别实现了3.91 ms和13.98 ms的最低推理延迟。加速器的吞吐能力可达2330.2 gps,能量效率为109 gps /W,与目前的先进设备相比有了显著提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A 109-GOPs/W FPGA-Based Vision Transformer Accelerator With Weight-Loop Dataflow Featuring Data Reusing and Resource Saving
The Vision Transformer (ViT) models have demonstrated excellent performance in computer vision tasks, but a large amount of computation and memory access for massive matrix multiplications lead to degraded hardware performance compared to convolutional neural network (CNN). In this paper, we propose a ViT accelerator with a novel “Weight-Loop” dataflow and its computing unit, for efficient matrix multiplication computation. By data partitioning and rearrangement, the number of memory accesses and the number of registers are greatly reduced, and the adder trees are eliminated. A computation pipeline with the proposed dataflow scheduling method is constructed to maintain a high utilization rate through zero bubble switching. Moreover, a novel accurate dual INT8 multiply-accumulate (DI8MAC) method for DSP optimization is introduced to eliminate the additional correction circuits by weight encoding. Verified in the Xilinx XCZU9EG FPGA, the proposed ViT accelerator achieves the lowest inference latencies of 3.91 ms and 13.98 ms for ViT-S and ViT-B, respectively. The throughput of the accelerator can reach up to 2330.2 GOPs with an energy efficiency of 109 GOPs/W, showing a significant improvement compared to the state-of-the-art works.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
13.80
自引率
27.40%
发文量
660
审稿时长
5 months
期刊介绍: The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.
期刊最新文献
TinySplat: Feedforward Approach for Generating Compact 3D Scene Representation GSCodec Studio: A Modular Framework for Gaussian Splat Compression Syntax Element Encryption for H.265/HEVC Using Chaotic Map-Based Coefficient Scrambling Scheme Learning Confidence-Aware Prototypes for Weakly-Supervised Video Anomaly Detection Learned Point Cloud Attribute Compression With Cross-Scale Point Transformer and Geometry-Aware Context Prediction Entropy Model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1