Low-cost constant time signed digit selection for most significant bit first multiplication

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Microprocessors and Microsystems Pub Date : 2024-11-01 Epub Date: 2024-10-18 DOI:10.1016/j.micpro.2024.105118

Ghassem Jaberipur , Saeid Gorgin , Jeong-A. Lee

{"title":"Low-cost constant time signed digit selection for most significant bit first multiplication","authors":"Ghassem Jaberipur , Saeid Gorgin , Jeong-A. Lee","doi":"10.1016/j.micpro.2024.105118","DOIUrl":null,"url":null,"abstract":"<div><div>Serial binary multiplication is frequently used in many digital applications. In particular, left-to-right (aka online) manipulation of operands promotes the real-time generation of product digits for immediate utilization in subsequent online computations (e.g., successive layers of a neural network). In the left-to-right arithmetic operations, where a residual is maintained for digit selection, utilization of a redundant number system for the representation of outputs is mandatory, while the input operands and the residual may be redundant or non-redundant. However, when the input data paths are narrow (e.g., eight bits as in BFloat16), conventional non-redundant representations of inputs and residual provide some advantages. For example, the immediate and costless sign detection of the residual that is necessary for the next digit selection; a property not shared by redundant numbers. Nevertheless, digit selection, as practiced in the previous realizations, with both redundant and non-redundant inputs and/or residual, is slow and rather complex. Therefore, in this paper, we offer an imprecise, but faster digit selection scheme, with the required correction in the next cycle. Analytical evaluations and synthesis of the proposed circuits on FPGA platform, shows 30 % speedup and less cost with respect to both cases with redundant and non-redundant inputs and residual.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"111 ","pages":"Article 105118"},"PeriodicalIF":2.6000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microprocessors and Microsystems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141933124001133","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/18 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Serial binary multiplication is frequently used in many digital applications. In particular, left-to-right (aka online) manipulation of operands promotes the real-time generation of product digits for immediate utilization in subsequent online computations (e.g., successive layers of a neural network). In the left-to-right arithmetic operations, where a residual is maintained for digit selection, utilization of a redundant number system for the representation of outputs is mandatory, while the input operands and the residual may be redundant or non-redundant. However, when the input data paths are narrow (e.g., eight bits as in BFloat16), conventional non-redundant representations of inputs and residual provide some advantages. For example, the immediate and costless sign detection of the residual that is necessary for the next digit selection; a property not shared by redundant numbers. Nevertheless, digit selection, as practiced in the previous realizations, with both redundant and non-redundant inputs and/or residual, is slow and rather complex. Therefore, in this paper, we offer an imprecise, but faster digit selection scheme, with the required correction in the next cycle. Analytical evaluations and synthesis of the proposed circuits on FPGA platform, shows 30 % speedup and less cost with respect to both cases with redundant and non-redundant inputs and residual.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

低成本恒定时间有符号数位选择，用于最显著位首数乘法

串行二进制乘法经常用于许多数字应用中。特别是，操作数的从左到右（又称在线）运算可促进实时生成乘积数字，以便在随后的在线计算（如神经网络的连续层）中立即使用。在从左到右的算术运算中，需要保留一个残差用于数字选择，因此必须使用冗余数字系统来表示输出，而输入操作数和残差可以是冗余或非冗余的。然而，当输入数据路径较窄时（如 BFloat16 中的 8 位），传统的非冗余输入和残差表示法具有一些优势。例如，下一位数选择所需的残差可立即、无代价地进行符号检测；这是冗余数字所不具备的特性。尽管如此，在以往的实现过程中，利用冗余和非冗余输入和/或残差进行数字选择的速度很慢，而且相当复杂。因此，在本文中，我们提供了一种不精确但更快的数字选择方案，并在下一个周期进行所需的校正。在 FPGA 平台上对所提电路进行的分析评估和综合显示，与冗余和非冗余输入及残差两种情况相比，速度提高了 30%，成本降低了。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Microprocessors and Microsystems 工程技术-工程：电子与电气

CiteScore

6.90

自引率

3.80%

发文量

204

审稿时长

172 days

期刊介绍： Microprocessors and Microsystems: Embedded Hardware Design (MICPRO) is a journal covering all design and architectural aspects related to embedded systems hardware. This includes different embedded system hardware platforms ranging from custom hardware via reconfigurable systems and application specific processors to general purpose embedded processors. Special emphasis is put on novel complex embedded architectures, such as systems on chip (SoC), systems on a programmable/reconfigurable chip (SoPC) and multi-processor systems on a chip (MPSoC), as well as, their memory and communication methods and structures, such as network-on-chip (NoC). Design automation of such systems including methodologies, techniques, flows and tools for their design, as well as, novel designs of hardware components fall within the scope of this journal. Novel cyber-physical applications that use embedded systems are also central in this journal. While software is not in the main focus of this journal, methods of hardware/software co-design, as well as, application restructuring and mapping to embedded hardware platforms, that consider interplay between software and hardware components with emphasis on hardware, are also in the journal scope.

期刊最新文献

CGR-AI Engine: A scalable CGRA-based processing platform for Artificial Intelligence in space applications The TEXTAROSSA project: Cool all the Way Down to the Hardware Analyzing the impact of functional approximation on the resilience of Deep Neural Networks Efficient associative processing in FPGA LoLiPoP-IoT: Advancing the energy-efficient Internet of Things