{"title":"High-Flexibility Designs of Quantized Runtime Reconfigurable Multi-Precision Multipliers","authors":"Yuhao Liu;Shubham Rai;Salim Ullah;Akash Kumar","doi":"10.1109/LES.2023.3298736","DOIUrl":null,"url":null,"abstract":"Recent research widely explored the quantization schemes on hardware. However, for recent accelerators only supporting 8 bits quantization, such as Google TPU, the lower-precision inputs, such as 1/2-bit quantized neural network models in FINN, need to extend the data width to meet the hardware interface requirements. This conversion influences communication and computing efficiency. To improve the flexibility and throughput of quantized multipliers, our work explores two novel reconfigurable multiplier designs that can repartition the number of input channels in runtime based on input precision and reconfigure the signed/unsigned multiplication modes. In this letter, we explored two novel runtime reconfigurable multi-precision multipliers based on the multiplier-tree and bit-serial multiplier architectures. We evaluated our designs by implementing a systolic array and single-layer neural network accelerator on the Ultra96 FPGA platform. The result shows the flexibility of our implementation and the high speedup for low-precision quantized multiplication working with a fixed data width of the hardware interface.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"194-197"},"PeriodicalIF":1.7000,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Embedded Systems Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10261986/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Recent research widely explored the quantization schemes on hardware. However, for recent accelerators only supporting 8 bits quantization, such as Google TPU, the lower-precision inputs, such as 1/2-bit quantized neural network models in FINN, need to extend the data width to meet the hardware interface requirements. This conversion influences communication and computing efficiency. To improve the flexibility and throughput of quantized multipliers, our work explores two novel reconfigurable multiplier designs that can repartition the number of input channels in runtime based on input precision and reconfigure the signed/unsigned multiplication modes. In this letter, we explored two novel runtime reconfigurable multi-precision multipliers based on the multiplier-tree and bit-serial multiplier architectures. We evaluated our designs by implementing a systolic array and single-layer neural network accelerator on the Ultra96 FPGA platform. The result shows the flexibility of our implementation and the high speedup for low-precision quantized multiplication working with a fixed data width of the hardware interface.
期刊介绍:
The IEEE Embedded Systems Letters (ESL), provides a forum for rapid dissemination of latest technical advances in embedded systems and related areas in embedded software. The emphasis is on models, methods, and tools that ensure secure, correct, efficient and robust design of embedded systems and their applications.