An FPGA-Based Transformer Accelerator With Parallel Unstructured Sparsity Handling for Question-Answering Applications

IF 4 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Circuits and Systems II: Express Briefs Pub Date : 2024-09-17 DOI:10.1109/TCSII.2024.3462560
Rujian Cao;Zhongyu Zhao;Ka-Fai Un;Wei-Han Yu;Rui P. Martins;Pui-In Mak
{"title":"An FPGA-Based Transformer Accelerator With Parallel Unstructured Sparsity Handling for Question-Answering Applications","authors":"Rujian Cao;Zhongyu Zhao;Ka-Fai Un;Wei-Han Yu;Rui P. Martins;Pui-In Mak","doi":"10.1109/TCSII.2024.3462560","DOIUrl":null,"url":null,"abstract":"Dataflow management provides limited performance improvement to the transformer model due to its lesser weight reuse than the convolution neural network. The cosFormer reduced computational complexity while achieving comparable performance to the vanilla transformer for natural language processing tasks. However, the unstructured sparsity in the cosFormer makes it a challenge to be implemented efficiently. This brief proposes a parallel unstructured sparsity handling (PUSH) scheme to compute sparse-dense matrix multiplication (SDMM) efficiently. It transforms unstructured sparsity into structured sparsity and reduces the total memory access by balancing the memory accesses of the sparse and dense matrices in the SDMM. We also employ unstructured weight pruning cooperating with PUSH to further increase the structured sparsity of the model. Through verification on an FPGA platform, the proposed accelerator achieves a throughput of 2.82 TOPS and an energy efficiency of 144.8 GOPs/W for HotpotQA dataset with long sequences.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"71 11","pages":"4688-4692"},"PeriodicalIF":4.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems II: Express Briefs","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10681589/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Dataflow management provides limited performance improvement to the transformer model due to its lesser weight reuse than the convolution neural network. The cosFormer reduced computational complexity while achieving comparable performance to the vanilla transformer for natural language processing tasks. However, the unstructured sparsity in the cosFormer makes it a challenge to be implemented efficiently. This brief proposes a parallel unstructured sparsity handling (PUSH) scheme to compute sparse-dense matrix multiplication (SDMM) efficiently. It transforms unstructured sparsity into structured sparsity and reduces the total memory access by balancing the memory accesses of the sparse and dense matrices in the SDMM. We also employ unstructured weight pruning cooperating with PUSH to further increase the structured sparsity of the model. Through verification on an FPGA platform, the proposed accelerator achieves a throughput of 2.82 TOPS and an energy efficiency of 144.8 GOPs/W for HotpotQA dataset with long sequences.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于 FPGA 的变压器加速器,可为答题应用提供并行非结构稀疏性处理功能
与卷积神经网络相比,数据流管理的权重重复利用率较低,因此对变换器模型的性能提升有限。cosFormer 降低了计算复杂度,同时在自然语言处理任务中实现了与 vanilla transformer 相当的性能。然而,cosFormer 中的非结构稀疏性使其难以有效实现。本摘要提出了一种并行非结构稀疏性处理(PUSH)方案,以高效计算稀疏密集矩阵乘法(SDMM)。该方案将非结构稀疏性转化为结构稀疏性,并通过平衡 SDMM 中稀疏矩阵和密集矩阵的内存访问来减少总内存访问。我们还采用了非结构化权重剪枝技术与 PUSH 技术相结合,进一步提高了模型的结构稀疏性。通过在 FPGA 平台上的验证,针对长序列的 HotpotQA 数据集,所提出的加速器实现了 2.82 TOPS 的吞吐量和 144.8 GOPs/W 的能效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Circuits and Systems II: Express Briefs
IEEE Transactions on Circuits and Systems II: Express Briefs 工程技术-工程:电子与电气
CiteScore
7.90
自引率
20.50%
发文量
883
审稿时长
3.0 months
期刊介绍: TCAS II publishes brief papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: Circuits: Analog, Digital and Mixed Signal Circuits and Systems Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic Circuits and Systems, Power Electronics and Systems Software for Analog-and-Logic Circuits and Systems Control aspects of Circuits and Systems.
期刊最新文献
Table of Contents Guest Editorial Special Issue on the 2024 ISICAS: A CAS Journal Track Symposium IEEE Circuits and Systems Society Information IEEE Transactions on Circuits and Systems--II: Express Briefs Publication Information A 4.3 GS/s Time-Interleaved ΔΣ DAC With Temperature-Insensitive Bias and Harmonic Cancellation for Qubit Control
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1