Low-Latency and High-Bandwidth Pipelined Radix-64 Division and Square Root Unit

2022 IEEE 29th Symposium on Computer Arithmetic (ARITH) Pub Date : 2022-09-01 DOI:10.1109/ARITH54963.2022.00012

J. Bruguera

引用次数: 1

Abstract

Digit-recurrence algorithms are widely used in actual microprocessors to compute floating-point division and square root. These iterative algorithms present a good trade-off in terms of performance, area and power. Commercial processors have non-pipelined division and square root units where part of the logic is used over several cycles. The main drawbacks of these non-pipelined units are the long latency of the traditional division and square root implementations, the low bandwidth (or throughput) due to the reuse of part of the logic over several cycles, and its hardware complexity with separated logic for division and square root. We present a radix-64 floating-point division and square root algorithm with a common iteration for division and square root and where each radix-64 iteration is made of two simpler radix-8 iterations. The radix-64 algorithm allows to get low-latency operations, and the common division and square root radix-64 iteration results in some area reduction. The algorithm is mapped into a low-latency and high-bandwidth pipelined unit.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

低延迟和高带宽管道64根除法和平方根单位

数字递归算法在实际微处理器中广泛应用于浮点除法和平方根运算。这些迭代算法在性能、面积和功率方面表现出良好的权衡。商业处理器具有非流水线的除法和平方根单元，其中部分逻辑在几个周期中使用。这些非流水线单元的主要缺点是传统的除法和平方根实现的长延迟，由于在几个周期内重复使用部分逻辑而导致的低带宽(或吞吐量)，以及除法和平方根的分离逻辑的硬件复杂性。我们提出了一个基数64浮点除法和平方根算法，该算法具有除法和平方根的公共迭代，其中每个基数64迭代由两个更简单的基数8迭代组成。基数-64算法允许获得低延迟的操作，并且公除法和根号基数-64迭代可以减少一些面积。该算法被映射到一个低延迟和高带宽的流水线单元。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)

自引率

0.00%

发文量

期刊最新文献

Enhanced Floating-Point Adder with Full Denormal Support A BF16 FMA is All You Need for DNN Training Foreword: ARITH 2022 Approximate Recursive Multipliers Using Low Power Building Blocks The CORE-MATH Project