{"title":"FPGA-aware automatic acceleration framework for vision transformer with mixed-scheme quantization: late breaking results","authors":"Mengshu Sun, Z. Li, Alec Lu, Haoyu Ma, Geng Yuan, Yanyue Xie, Hao Tang, Yanyu Li, M. Leeser, Zhangyang Wang, Xue Lin, Zhenman Fang","doi":"10.1145/3489517.3530618","DOIUrl":null,"url":null,"abstract":"Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks. However, their complex architecture and enormous computation/storage demand impose urgent needs for new hardware accelerator design methodology. This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization. To the best of our knowledge, this is the first FPGA-based ViT acceleration framework exploring model quantization. Compared with state-of-the-art ViT quantization work (algorithmic approach only without hardware acceleration), our quantization achieves 0.31% to 1.25% higher Top-1 accuracy under the same bit-width. Compared with the 32-bit floating-point baseline FPGA accelerator, our accelerator achieves around 5.6× improvement on the frame rate (i.e., 56.4 FPS vs. 10.0 FPS) with 0.83% accuracy drop for DeiT-base.","PeriodicalId":373005,"journal":{"name":"Proceedings of the 59th ACM/IEEE Design Automation Conference","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 59th ACM/IEEE Design Automation Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3489517.3530618","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks. However, their complex architecture and enormous computation/storage demand impose urgent needs for new hardware accelerator design methodology. This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization. To the best of our knowledge, this is the first FPGA-based ViT acceleration framework exploring model quantization. Compared with state-of-the-art ViT quantization work (algorithmic approach only without hardware acceleration), our quantization achieves 0.31% to 1.25% higher Top-1 accuracy under the same bit-width. Compared with the 32-bit floating-point baseline FPGA accelerator, our accelerator achieves around 5.6× improvement on the frame rate (i.e., 56.4 FPS vs. 10.0 FPS) with 0.83% accuracy drop for DeiT-base.
视觉变压器(ViTs)的出现大大提高了计算机视觉任务的精度。然而,它们复杂的体系结构和巨大的计算/存储需求迫切需要新的硬件加速器设计方法。本文提出了一种基于混合方案量化的fpga感知ViT自动加速框架。据我们所知,这是第一个基于fpga的ViT加速框架探索模型量化。与最先进的ViT量化工作(仅采用不带硬件加速的算法方法)相比,我们的量化在相同比特宽度下的Top-1精度提高了0.31%至1.25%。与32位浮点基准FPGA加速器相比,我们的加速器在帧速率上实现了大约5.6倍的改进(即56.4 FPS vs. 10.0 FPS),在DeiT-base上精度下降了0.83%。