Pub Date : 2024-09-17DOI: 10.1109/TCSII.2024.3462560
Rujian Cao;Zhongyu Zhao;Ka-Fai Un;Wei-Han Yu;Rui P. Martins;Pui-In Mak
Dataflow management provides limited performance improvement to the transformer model due to its lesser weight reuse than the convolution neural network. The cosFormer reduced computational complexity while achieving comparable performance to the vanilla transformer for natural language processing tasks. However, the unstructured sparsity in the cosFormer makes it a challenge to be implemented efficiently. This brief proposes a parallel unstructured sparsity handling (PUSH) scheme to compute sparse-dense matrix multiplication (SDMM) efficiently. It transforms unstructured sparsity into structured sparsity and reduces the total memory access by balancing the memory accesses of the sparse and dense matrices in the SDMM. We also employ unstructured weight pruning cooperating with PUSH to further increase the structured sparsity of the model. Through verification on an FPGA platform, the proposed accelerator achieves a throughput of 2.82 TOPS and an energy efficiency of 144.8 GOPs/W for HotpotQA dataset with long sequences.
{"title":"An FPGA-Based Transformer Accelerator With Parallel Unstructured Sparsity Handling for Question-Answering Applications","authors":"Rujian Cao;Zhongyu Zhao;Ka-Fai Un;Wei-Han Yu;Rui P. Martins;Pui-In Mak","doi":"10.1109/TCSII.2024.3462560","DOIUrl":"10.1109/TCSII.2024.3462560","url":null,"abstract":"Dataflow management provides limited performance improvement to the transformer model due to its lesser weight reuse than the convolution neural network. The cosFormer reduced computational complexity while achieving comparable performance to the vanilla transformer for natural language processing tasks. However, the unstructured sparsity in the cosFormer makes it a challenge to be implemented efficiently. This brief proposes a parallel unstructured sparsity handling (PUSH) scheme to compute sparse-dense matrix multiplication (SDMM) efficiently. It transforms unstructured sparsity into structured sparsity and reduces the total memory access by balancing the memory accesses of the sparse and dense matrices in the SDMM. We also employ unstructured weight pruning cooperating with PUSH to further increase the structured sparsity of the model. Through verification on an FPGA platform, the proposed accelerator achieves a throughput of 2.82 TOPS and an energy efficiency of 144.8 GOPs/W for HotpotQA dataset with long sequences.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"71 11","pages":"4688-4692"},"PeriodicalIF":4.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-17DOI: 10.1109/tcsii.2024.3462557
Nannan Li, Hanrui Zhang, Bin Liu, Lei Pei, Jinfu Wang, Huanhuan Qi, Jie Zhang, Xiaofei Wang, Hong Zhang
{"title":"A 10-bit 500-MS/s Pipelined SAR ADC With Nonlinearity-Compensated Open-loop Amplifier and Parallel Conversion Through Comparator Reusing","authors":"Nannan Li, Hanrui Zhang, Bin Liu, Lei Pei, Jinfu Wang, Huanhuan Qi, Jie Zhang, Xiaofei Wang, Hong Zhang","doi":"10.1109/tcsii.2024.3462557","DOIUrl":"https://doi.org/10.1109/tcsii.2024.3462557","url":null,"abstract":"","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"26 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1109/tcsii.2024.3460809
Shixing Li, Chenyi Wang, Zhongzhen Tong, Chao Wang, Bi Wang, Zhaohao Wang
{"title":"A Novel Radiation-Hardened, Speed and Power Optimized Nonvolatile Latch for Aerospace Applications","authors":"Shixing Li, Chenyi Wang, Zhongzhen Tong, Chao Wang, Bi Wang, Zhaohao Wang","doi":"10.1109/tcsii.2024.3460809","DOIUrl":"https://doi.org/10.1109/tcsii.2024.3460809","url":null,"abstract":"","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"1 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1109/tcsii.2024.3460072
Mehran Ghahramani, Hamed Hoznian, Amir Nikpaik
{"title":"A Bandwidth Extension Technique for Improving Jitter in Ring-VCO-Based Sub-Sampling PLLs","authors":"Mehran Ghahramani, Hamed Hoznian, Amir Nikpaik","doi":"10.1109/tcsii.2024.3460072","DOIUrl":"https://doi.org/10.1109/tcsii.2024.3460072","url":null,"abstract":"","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"209 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1109/tcsii.2024.3460171
Kang-Li Xu, Zhen Li, Peter Benner
{"title":"Parametric Interpolation Model Order Reduction on Grassmann Manifolds by Parallelization","authors":"Kang-Li Xu, Zhen Li, Peter Benner","doi":"10.1109/tcsii.2024.3460171","DOIUrl":"https://doi.org/10.1109/tcsii.2024.3460171","url":null,"abstract":"","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"47 21 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142267839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-12DOI: 10.1109/tcsii.2024.3459091
Ahmed S. Elwakil, Anis Allagui, Mohamed B. Elamien, Costas Psychalinos, Brent Maundy
{"title":"Closed Form Expressions for the Input Impedance of Some 2-D Fractal Circuit Networks","authors":"Ahmed S. Elwakil, Anis Allagui, Mohamed B. Elamien, Costas Psychalinos, Brent Maundy","doi":"10.1109/tcsii.2024.3459091","DOIUrl":"https://doi.org/10.1109/tcsii.2024.3459091","url":null,"abstract":"","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"14 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142193362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-10DOI: 10.1109/tcsii.2024.3457494
Jianfei Wang, Jia Hou, Fahong Zhang, Yishuo Meng, Yang Su, Chen Yang
{"title":"An Efficient and Parallelism-Scalable Large Integer Multiplier Architecture Using Least-Positive Form and Winograd Fast Algorithm","authors":"Jianfei Wang, Jia Hou, Fahong Zhang, Yishuo Meng, Yang Su, Chen Yang","doi":"10.1109/tcsii.2024.3457494","DOIUrl":"https://doi.org/10.1109/tcsii.2024.3457494","url":null,"abstract":"","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"37 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142193385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}