{"title":"SAUST: A Scheme for Acceleration of Unstructured Sparse Transformer","authors":"Yifan Song, Shunpeng Zhao, Song Chen, Yi Kang","doi":"10.1109/ICTA56932.2022.9963119","DOIUrl":null,"url":null,"abstract":"Transformer achieves impressive results on many AI tasks. However, it also introduces a huge amount of computation. Pruning is a promising method to reduce the computation load by generating sparse transformer models. To avoid load imbalance caused by computing involved in zero elements, previous works explore structured pruning combined with hardware acceleration. However, tight constraints in structured pruning usually make training much harder and reach a lower sparsity level in the end. This paper proposes SAUST, a scheme that exploits the high sparsity level of unstructured pruning and addresses the load imbalance problem using both hardware and software methods. FPGA implementation shows that SAUST can achieve 3.35x and 2.76x execution time speedup compared to two state-of-the-art references on hardware accelerators.","PeriodicalId":325602,"journal":{"name":"2022 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTA56932.2022.9963119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Transformer achieves impressive results on many AI tasks. However, it also introduces a huge amount of computation. Pruning is a promising method to reduce the computation load by generating sparse transformer models. To avoid load imbalance caused by computing involved in zero elements, previous works explore structured pruning combined with hardware acceleration. However, tight constraints in structured pruning usually make training much harder and reach a lower sparsity level in the end. This paper proposes SAUST, a scheme that exploits the high sparsity level of unstructured pruning and addresses the load imbalance problem using both hardware and software methods. FPGA implementation shows that SAUST can achieve 3.35x and 2.76x execution time speedup compared to two state-of-the-art references on hardware accelerators.