Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors最新文献

英文中文

PAPA - packed arithmetic on a prefix adder for multimedia applications 多媒体应用中前缀加法器的PAPA打包算法

Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors

Pub Date : 2002-07-17 DOI: 10.1109/ASAP.2002.1030719

N. Burgess

This paper introduces PAPA: packed arithmetic on a prefix adder, a new approach to parallel prefix adder design that supports a wide variety of packed arithmetic computations, including packed add and subtract with saturation, packed rounded average, and packed absolute difference. The approach consists of altering the prefix adder cell logic equations to take advantage of a previously unused "don't care" state. The principle of logical effort is employed to assess the delay of the new adder architecture by establishing the extra effort needed to select and drive the appropriate carry signal to the requisite sum sub-word. This adder will find applications in video processors and other multimedia-orientated processor chips that implement packed arithmetic operations.

本文介绍了前缀加法器上的PAPA:打包算法，这是一种新的并行前缀加法器设计方法，它支持多种打包算法的计算，包括带饱和的打包加减、打包取整平均和打包绝对差。该方法包括改变前缀加法器单元逻辑方程，以利用以前未使用的“不关心”状态。通过确定选择和驱动适当的进位信号到必要的求和子字所需的额外努力，采用逻辑努力原则来评估新加法器结构的延迟。该加法器将在视频处理器和其他实现打包算术运算的多媒体处理器芯片中得到应用。

引用次数: 9

Matrix engine for signal processing applications using the logarithmic number system 矩阵引擎用于信号处理应用，使用对数数制

Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors

Pub Date : 2002-07-17 DOI: 10.1109/ASAP.2002.1030730

E. Chester, J. N. Coleman

An architecture design is presented for a device based upon the logarithmic number system (LNS) that is capable of performing general matrix and complex arithmetic, with features useful for DSP system-on-chip applications. A modified LNS addition/subtraction unit is employed in multiple execution units to achieve a maximum single-precision floating-point (FP) equivalent throughput of 3.2 Gflop/s at a clock frequency of 200 MHz. Each execution unit is capable of computing functions of the form (ab + cd)/sup e/ for e /spl isin/ {/spl plusmn/0.5, /spl plusmn/1, /spl plusmn/2} in a 5-stage arithmetic pipeline and returning a result every cycle, yielding a considerable per-cycle improvement over both floating- and fixed-point systems. Comparisons with existing devices and a single floating-point unit are given.

提出了一种基于对数系统(LNS)的器件体系结构设计，该器件能够执行一般矩阵和复杂运算，并具有DSP片上系统应用的特点。在多个执行单元中采用改进的LNS加减单元，在时钟频率为200mhz时，最大单精度浮点吞吐量可达3.2 Gflop/s。每个执行单元都能够在一个5阶段的算术管道中计算形式为(ab + cd)/sup /的函数(对于e/ spl isin/ {/spl plusmn/0.5， /spl plusmn/1， /spl plusmn/2}的函数，并在每个周期返回一个结果，与浮点和浮点系统相比，每个周期都有相当大的改进。给出了与现有器件和单个浮点单元的比较。

引用次数: 19

微信

客服QQ

扫码关注我们

反馈

Book学术文献互助群
群号：481959085

文献互助智能选刊最新文献互助须知联系我们：info@booksci.cn

Book学术提供免费学术资源搜索服务，方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。