An FPGA-Based Transformer Accelerator With Parallel Unstructured Sparsity Handling for Question-Answering Applications

IF 4.9 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Circuits and Systems II: Express Briefs Pub Date : 2024-09-17 DOI:10.1109/TCSII.2024.3462560

Rujian Cao;Zhongyu Zhao;Ka-Fai Un;Wei-Han Yu;Rui P. Martins;Pui-In Mak

{"title":"An FPGA-Based Transformer Accelerator With Parallel Unstructured Sparsity Handling for Question-Answering Applications","authors":"Rujian Cao;Zhongyu Zhao;Ka-Fai Un;Wei-Han Yu;Rui P. Martins;Pui-In Mak","doi":"10.1109/TCSII.2024.3462560","DOIUrl":null,"url":null,"abstract":"Dataflow management provides limited performance improvement to the transformer model due to its lesser weight reuse than the convolution neural network. The cosFormer reduced computational complexity while achieving comparable performance to the vanilla transformer for natural language processing tasks. However, the unstructured sparsity in the cosFormer makes it a challenge to be implemented efficiently. This brief proposes a parallel unstructured sparsity handling (PUSH) scheme to compute sparse-dense matrix multiplication (SDMM) efficiently. It transforms unstructured sparsity into structured sparsity and reduces the total memory access by balancing the memory accesses of the sparse and dense matrices in the SDMM. We also employ unstructured weight pruning cooperating with PUSH to further increase the structured sparsity of the model. Through verification on an FPGA platform, the proposed accelerator achieves a throughput of 2.82 TOPS and an energy efficiency of 144.8 GOPs/W for HotpotQA dataset with long sequences.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"71 11","pages":"4688-4692"},"PeriodicalIF":4.9000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems II: Express Briefs","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10681589/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Dataflow management provides limited performance improvement to the transformer model due to its lesser weight reuse than the convolution neural network. The cosFormer reduced computational complexity while achieving comparable performance to the vanilla transformer for natural language processing tasks. However, the unstructured sparsity in the cosFormer makes it a challenge to be implemented efficiently. This brief proposes a parallel unstructured sparsity handling (PUSH) scheme to compute sparse-dense matrix multiplication (SDMM) efficiently. It transforms unstructured sparsity into structured sparsity and reduces the total memory access by balancing the memory accesses of the sparse and dense matrices in the SDMM. We also employ unstructured weight pruning cooperating with PUSH to further increase the structured sparsity of the model. Through verification on an FPGA platform, the proposed accelerator achieves a throughput of 2.82 TOPS and an energy efficiency of 144.8 GOPs/W for HotpotQA dataset with long sequences.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于 FPGA 的变压器加速器，可为答题应用提供并行非结构稀疏性处理功能

与卷积神经网络相比，数据流管理的权重重复利用率较低，因此对变换器模型的性能提升有限。cosFormer 降低了计算复杂度，同时在自然语言处理任务中实现了与 vanilla transformer 相当的性能。然而，cosFormer 中的非结构稀疏性使其难以有效实现。本摘要提出了一种并行非结构稀疏性处理（PUSH）方案，以高效计算稀疏密集矩阵乘法（SDMM）。该方案将非结构稀疏性转化为结构稀疏性，并通过平衡 SDMM 中稀疏矩阵和密集矩阵的内存访问来减少总内存访问。我们还采用了非结构化权重剪枝技术与 PUSH 技术相结合，进一步提高了模型的结构稀疏性。通过在 FPGA 平台上的验证，针对长序列的 HotpotQA 数据集，所提出的加速器实现了 2.82 TOPS 的吞吐量和 144.8 GOPs/W 的能效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Circuits and Systems II: Express Briefs 工程技术-工程：电子与电气

CiteScore

7.90

自引率

20.50%

发文量

883

审稿时长

3.0 months

期刊介绍： TCAS II publishes brief papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: Circuits: Analog, Digital and Mixed Signal Circuits and Systems Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic Circuits and Systems, Power Electronics and Systems Software for Analog-and-Logic Circuits and Systems Control aspects of Circuits and Systems.

期刊最新文献

IEEE Circuits and Systems Society Information Online Load Power Factor Angle Estimation and Its Application in a Current Sensor-Less Dead-Time Distortion Compensation Technique in Single-Phase PWM VSI With Lagging Load IEEE Circuits and Systems Society Information Table of Contents Incoming Editorial