[1990] Proceedings of the International Conference on Application Specific Array Processors最新文献

英文中文

Bit-level systolic algorithm for the symmetric eigenvalue problem 对称特征值问题的位级收缩算法

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145511

J. Delosme

An arithmetic algorithm is presented which speeds up the parallel Jacobi method for the eigen-decomposition of real symmetric matrices. After analyzing the elementary mathematical operations in the Jacobi method (i.e. the evaluation and application of Jacobi rotations), the author devises arithmetic algorithms that effect these mathematical operations with few primitive operations (i.e. few shifts and adds) and enable the most efficient use of the parallel hardware. The matrices to which the plane Jacobi rotations are applied are decomposed into even and odd parts, enabling the application of the rotations from a single side and thus removing some sequentiality from the original method. The rotations are evaluated and applied in a fully concurrent fashion with the help of an implicit CORDIC algorithm. In addition, the CORDIC algorithm can perform rotations with variable resolution, which lead to a significant reduction in the total computation time.<>

提出了一种提高实数对称矩阵特征分解并行Jacobi方法速度的算法。在分析了Jacobi方法中的初等数学运算(即Jacobi旋转的求值和应用)之后，作者设计了用很少的基本运算(即很少的移位和加法)来实现这些数学运算的算术算法，并使并行硬件得到最有效的利用。应用平面雅可比旋转的矩阵被分解为偶数和奇数部分，允许从单面应用旋转，从而从原始方法中去除一些顺序性。在隐式CORDIC算法的帮助下，以完全并发的方式评估和应用旋转。此外，CORDIC算法可以执行可变分辨率的旋转，从而大大减少了总计算时间。

引用次数: 16

Reconfiguration of FFT arrays: a flow-driven approach FFT阵列的重构:一个流驱动的方法

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145476

A. Antola, N. Scarabottolo

A new reconfiguration algorithm for defect and fault tolerance in fast Fourier transform (FFT) two-dimensional arrays is presented. The reconfiguration scheme is based on the data flow of the algorithm to minimize the overhead due to the re-routing of information in the reconfigured array. Evaluation of the effectiveness of this approach shows a significant increase in system robustness with respect to other, non-dedicated reconfiguration approaches. Moreover, the possibility of choosing between two reconfiguration algorithms characterized by different complexities and efficiencies results in both an optimal, host-driven reconfiguration (particularly suited for end-of-production yield enhancement) and a fast, self-performed reconfiguration (suited for on-line reliability enhancement).<>

提出了一种新的快速傅立叶变换(FFT)二维阵列缺陷容错重构算法。重构方案基于算法的数据流，最大限度地减少重构数组中信息重路由带来的开销。对该方法有效性的评估表明，相对于其他非专用重新配置方法，系统鲁棒性显著增加。此外，在两种具有不同复杂性和效率的重构算法之间进行选择的可能性，既可以实现最优的、主机驱动的重构(特别适合于提高终端产量)，也可以实现快速的、自我执行的重构(适合于提高在线可靠性)。

引用次数: 3

Digit-serial VLSI microarchitecture 数字串行VLSI微架构

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145482

S. Smith, J. Payne, R. Morgan

The authors illustrate the techniques by which a simple function library may be widely parameterized to meet the diverse function, throughput and accuracy requirements in high-performance integer arithmetic applications. In a design automation environment the user's view of these structures is, in the case of multipliers and adders, a simple functional icon carrying synthetic parameters which are derived from global throughput and accuracy requirements. Shifters are included automatically for consistency, allowing usage of the specified numerical resources to be maximized for any application. Processors of throughputs approaching one billion operations/sec may be easily assembled using these techniques, figures which are difficult to achieve in conventional architectures. The full power of parallelism and pipelining is brought to bear on computational problems, the price paid being the loss of inherent programmability.<>

在高性能整数运算应用中，简单的函数库可以广泛地参数化，以满足不同的功能、吞吐量和精度要求。在设计自动化环境中，用户对这些结构的看法是，在乘法器和加法器的情况下，一个简单的功能图标，携带来自全球吞吐量和精度要求的综合参数。移位器自动包括一致性，允许使用指定的数字资源，以最大限度地为任何应用程序。使用这些技术可以很容易地组装吞吐量接近10亿次/秒的处理器，这在传统架构中很难实现。并行和流水线的全部力量被用于计算问题，代价是失去固有的可编程性

引用次数: 1

A formal design methodology for parallel architectures 并行体系结构的正式设计方法

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145496

K. Elleithy, M. Bayoumi

The authors introduce a formal approach for synthesis of array architectures. The methodology provides two main features: completeness and correctness. Completeness means the ability to use the approach for any general algorithm. Correctness is achieved by using a set of transformations that are proved to be correct. Four different forms are used to express the input algorithm: simultaneous recursion, recursion with respect to different variables, fixed nesting, and variable nesting. Four different architectures for the same algorithm are obtained. As an example, a matrix-matrix multiplication algorithm is used to obtain four different optimal architectures. The different architectures of this example are compared in terms of area, time, broadcasting, and required hardware.<>

作者介绍了一种阵列结构综合的形式化方法。该方法提供了两个主要特性:完整性和正确性。完备性意味着能够将该方法用于任何一般算法。正确性是通过使用一组被证明是正确的转换来实现的。四种不同的形式用于表示输入算法:同时递归、不同变量递归、固定嵌套和变量嵌套。得到了同一算法的四种不同的体系结构。作为一个例子，使用矩阵-矩阵乘法算法来获得四种不同的最优结构。本例的不同架构在面积、时间、广播和所需硬件方面进行了比较。

引用次数: 2

Spacetime-minimal systolic architectures for Gaussian elimination and the algebraic path problem 高斯消去的时空最小收缩结构与代数路径问题

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145509

A. Benaini, Y. Robert

The authors have designed two systolic arrays that are both time-minimal and space-minimal for Gaussian elimination and the algebraic path problem (APP), thereby establishing the systolic complexity of these two computational kernels. The systolic computation is modeled by a directed acyclic graph (DAG) with nodes corresponding to computed values and arcs denoting dependencies. The computation DAG is taken to be fixed and given. The time to compute a DAG is determined when a timing function is assigned, or scheduled, to the nodes, subject to the constraints that a node can be computed only when its predecessors (the nodes which it depends upon) have been computed at previous steps, and no processor can compute two different nodes at the same time step. For a problem of size n, the authors obtain an execution time (T(n))=3n-1 using A(n)=n/sup 2//4+O(n) processors for Gaussian elimination, and T(n)=5n-2 and A(n)=n/sup 3//3+O(n) for the APP.<>

为高斯消去和代数路径问题(APP)设计了两个时间极小和空间极小的收缩数组，从而建立了这两个计算核的收缩复杂度。收缩计算采用有向无环图(DAG)建模，节点对应计算值，弧表示依赖关系。计算DAG是固定且给定的。计算DAG的时间是在为节点分配或调度定时函数时确定的，这取决于节点只能在其前一个节点(它所依赖的节点)在前一个步骤中被计算时才能计算的约束，并且没有处理器可以在同一时间步计算两个不同的节点。对于大小为n的问题，作者使用a (n)=n/sup 2//4+O(n)处理器获得执行时间(T(n))=3n-1，用于高斯消去，对于APP, T(n)=5n-2和a (n)=n/sup 3//3+O(n)。

引用次数: 23

The design of a high-performance scalable architecture for image processing applications 为图像处理应用设计一个高性能可扩展架构

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145506

C. T. Gray, Wentai Liu, T. Hughes, R. Cavin

The authors present the organization of an interleaved wrap-around memory system for a partitionable parallel/pipeline architecture with P pipes of L processors each. The architecture is designed to efficiently support real-time image processing and computer vision algorithms, especially those requiring global data operations. The interleaved memory system makes the architecture highly scalable in that L and P can be chosen to optimize performance for particular problems and reconfigurable in that, once L and P are fixed, problems of any size can still be mapped onto the architecture. The authors demonstrate techniques and methods for mapping computational structures to the architecture by considering the case of the 1-D butterfly network (1DBN). Since many other computational structures can be mapped to 1DBN, this gives a firm application base for the architecture. The authors also demonstrate methods for scheduling and controlling the memory system.<>

作者提出了一种可分区并行/管道体系结构的交错缠绕式存储系统的组织结构，每个管道有L个处理器。该架构旨在有效地支持实时图像处理和计算机视觉算法，特别是那些需要全局数据操作的算法。交错存储系统使体系结构具有高度可扩展性，因为L和P可以选择以优化特定问题的性能，并且可以重新配置，因为一旦L和P固定，任何大小的问题仍然可以映射到体系结构上。作者通过考虑一维蝴蝶网络(1DBN)的情况，演示了将计算结构映射到体系结构的技术和方法。由于许多其他计算结构可以映射到1DBN，因此这为该体系结构提供了坚实的应用程序基础。作者还演示了调度和控制存储系统的方法。

引用次数: 5

Byte-serial convolvers Byte-serial卷积器

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145489

L. Dadda

It is shown that previously proposed bit-serial convolver schemes (with weights in parallel form), working with zero separation between samples, can be transformed into byte-serial input schemes with a comparable clock rate, thus affording an increase in sampling rate equal to the number of bits in each byte. This is achieved by adopting a modified carry save circuit. The proposed schemes are based on a modified version of serial-parallel multipliers and on the use of pre-computed multiples of the weights. The case of 2-bit bytes is fully developed. It is shown that the use of samples represented in a biased binary number system leads to schemes that are only slightly more complex than the corresponding bit-serial schemes. The bit rate is determined by the delays of a full adder and a flip-flop. The schemes are composed by a number of bit-slices and appear to be easily partitionable in identical cascaded modules suitable for a fault tolerant architecture and a WSI implementation.<>

结果表明，先前提出的采样间隔为零的位串行卷积器方案(以并行形式的权重)可以转换为具有可比时钟速率的字节串行输入方案，从而提高采样率，使其等于每个字节中的位数。这是通过采用改进的进位保存电路实现的。所提出的方案是基于修改版本的串行-并行乘法器和使用预先计算的权重倍数。2位字节的情况得到了充分的发展。结果表明，使用偏置二进制数系统表示的样本只比相应的位串行方案稍微复杂一点。比特率由全加法器和触发器的延迟决定。这些方案由许多位片组成，并且在适合容错架构和WSI实现的相同级联模块中易于分区。

引用次数: 5

A real-time software programmable processor for HDTV and stereo scope signals 用于高清电视和立体声瞄准镜信号的实时软件可编程处理器

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145459

T. Nishitani, I. Tamitani, H. Harasaki, M. Yano

The architecture is an expanded version of a previously reported video signal processor in which a number of parallel processor clusters can be combined in a tandem connection form or in a parallel connection form. The new video signal processor introduces programmable time-expansion and time-compression circuits to A-to-D and D-to-A converters, respectively, for coping with high speed HDTV signals. It also employs input/output switch units before and after parallel processor clusters. The introduction of input/output switch units to the parallel processor clusters makes it possible to input several video signals simultaneously. By these additional units, a HDTV signal is converted to a set of NTSC level video signals in the time-expansion circuit. Every NTSC level video signal is then delivered to parallel processor clusters through an input switch unit. After processing in clusters, NTSC level signals are converted to a HDTV signal through an output switch unit and time-compression circuits. This architecture can be applied to stereo scope processing.<>

该架构是先前报道的视频信号处理器的扩展版本，其中多个并行处理器集群可以以串联连接形式或并行连接形式组合。新的视频信号处理器在A-to-D和D-to-A转换器中分别引入了可编程的时间扩展和时间压缩电路，用于处理高速高清电视信号。它还采用并行处理器集群前后的输入/输出开关单元。在并行处理器集群中引入输入/输出开关单元使得同时输入多个视频信号成为可能。通过这些附加单元，在时间扩展电路中将高清电视信号转换为一组NTSC级视频信号。然后，每个NTSC级视频信号通过输入开关单元传送到并行处理器集群。NTSC电平信号经过集群处理后，通过输出开关单元和时间压缩电路转换成高清电视信号。该体系结构可应用于立体范围处理。

引用次数: 4

Mapping algorithms onto the TUT cellular array processor 映射算法到TUT蜂窝式阵列处理器

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145461

J. Viitanen, T. Korpiharju, J. Takala, H. Kiminkinen

The Tampere University of Technology Cellular Array (TUTCA) processor array is based on a dynamically configurable logic cell array. It is intended for efficient implementation of the direct mapping dataflow principle with a self-timed, distributed control structure. The architecture of the processor, principles of mapping algorithms on it, and the compiler of the dataflow language are described. The language used for programming is a slightly modified version of DFL. The main features of DFL, the parser, the array processing, the graph structure generated by DFL, and the performance and exploitation of parallelism are considered.<>

坦佩雷科技大学的细胞阵列(TUTCA)处理器阵列是基于一个动态可配置的逻辑细胞阵列。它旨在有效地实现具有自定时、分布式控制结构的直接映射数据流原理。介绍了处理器的体系结构、映射算法的原理以及数据流语言的编译器。用于编程的语言是DFL的稍微修改版本。讨论了DFL的主要特点、解析器、数组处理、DFL生成的图结构以及并行性的性能和利用。

引用次数: 4

GRAPE: a special-purpose computer for N-body problems 用于n体问题的专用计算机

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145455

J. Makino, T. Ito, T. Ebisuzaki, D. Sugimoto

GRAPE (GRAvity PipE) is a special-purpose computer designed to accelerate the numerical integration of the astrophysical N-body problem. The prototype hardware, GRAPE-1, is designed as the backend processor that calculates the gravitational interaction between particles. All other calculations are performed on the host computer connected to GRAPE-1. For large-N calculations (N>or approximately=10/sup 4/), GRAPE-1 achieves about 200 Mflops equivalent in one board of the size of about 40 cm by 30 cm, consuming 2.5 W of power. The specialized pipelined architecture of the GRAPE-1 optimized for the large N calculation is the key to the high performance. The authors describe the design, construction and programming of GRAPE-1. The architecture is quite simple, and it is easy to put one pipeline into one LSI chip and make many pipelines work in parallel, without creating a communication bottleneck.<>

GRAPE(重力管)是一种特殊用途的计算机，旨在加速天体物理n体问题的数值积分。原型硬件GRAPE-1被设计为计算粒子间引力相互作用的后端处理器。所有其他计算都在连接到GRAPE-1的主机上执行。对于大N计算(N>或约=10/sup 4/)， GRAPE-1在一块尺寸约为40cm × 30cm的板上实现约200mflops的等效，消耗2.5 W的功率。针对大N计算优化的graph -1专用流水线架构是其高性能的关键。作者介绍了GRAPE-1的设计、施工和规划。架构非常简单，很容易将一个管道放入一个LSI芯片中，并使多个管道并行工作，而不会产生通信瓶颈

引用次数: 4

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

[1990] Proceedings of the International Conference on Application Specific Array Processors

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀