[1990] Proceedings of the International Conference on Application Specific Array Processors最新文献

英文中文

Analysing parametrised designs by non-standard interpretation 用非标准解释分析参数化设计

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145450

W. Luk

The authors consider the use of a nonstandard interpretation to analyze parametrized circuit descriptions, in particular for array based architectures. Various metrics are employed to characterize the performance tradeoffs for generic designs. The objective is to facilitate the comparison of feasible design alternatives at an early stage of development. The research centers on techniques for extracting various performance attributes, such as critical path and latency, from a single generic design representation. The features of this approach include-uniformity, modularity, reusability, flexibility, and computerized support.<>

作者考虑使用非标准解释来分析参数化电路描述，特别是基于阵列的架构。使用各种度量来描述通用设计的性能权衡。目的是促进在开发的早期阶段可行的设计方案的比较。研究集中在从单一通用设计表示中提取各种性能属性(如关键路径和延迟)的技术上。这种方法的特点包括:一致性、模块化、可重用性、灵活性和计算机化支持。

引用次数: 9

A VLSI architecture for simplified arithmetic Fourier transform algorithm 一种用于简化傅立叶变换算法的VLSI架构

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145490

I. Reed, M. Shih, T. Truong, E. Hendon, D. Tufts

The arithmetic Fourier transform (AFT) is a number-theoretic approach to Fourier analysis which has been shown to perform competitively with the classical fast Fourier transform (FFT) in terms of accuracy, complexity and speed. Theorems developed previously for the AFT algorithm are used to derive the original AFT algorithm which Bruns found in 1903. This is shown to yield an algorithm of less complexity and of improved performance over certain recent AFT algorithms. A computationally balanced AFT algorithm for Fourier analysis and signal processing is developed. This algorithm does not require complex multiplications. A VLSI architecture is suggested for this amplified AFT algorithm. This architecture uses a butterfly structure which reduces the number of additions by 25% over that used by the direct method. This efficient AFT algorithm is shown to be identical to Brun's original AFT algorithm.<>

算术傅里叶变换(AFT)是傅里叶分析的一种数论方法，在精度、复杂度和速度上都可以与经典的快速傅里叶变换(FFT)相媲美。以前为AFT算法开发的定理被用来推导最初的AFT算法，该算法是Bruns在1903年发现的。这被证明可以产生比某些最近的AFT算法更低复杂性和性能改进的算法。提出了一种计算平衡的傅里叶分析和信号处理AFT算法。该算法不需要复杂的乘法运算。针对这种放大后的AFT算法，提出了一种VLSI架构。该建筑使用蝴蝶结构，与直接方法相比，减少了25%的增加数量。这种高效的AFT算法被证明与brown的原始AFT算法相同。

引用次数: 35

PASIC. A sensor/processor array for computer vision PASIC。一种用于计算机视觉的传感器/处理器阵列

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145472

Keping Chen, P. Danielsson, Anders Åström

The PASIC prototype chip contains 256*256 photosensors, a linear array of 256 A/D converters, two 256 8-bit shift registers, 256 bit-serial processors, and a 256*256 bit dynamic RAM. It appears to be a viable architecture for low-level vision processing. The processors operate in SIMD model at 20 MHz. To avoid high speed transfer of analog data, an A/D converter in the form of a linear array of comparators is used. The architecture of the processing part conforms to the row parallel output from the A/D-converters. A simple but efficient processor excellently suited to the special VLSI constraints of the sensor was designed. The pitch in the present version of PASIC is 30 mu m and it was possible to fit the A/D-converter circuitry, the shift register, the ALU, and the memory into this narrow slot. A key factor is the unified structure achieved by extending the memory data bus to all other units within the same column. The versatility of the chip is shown using three algorithms: edge detection, shading correction, and histogram-based thresholding. Each is executed in approximately 10 ms.<>

该PASIC原型芯片包含256*256光传感器、256 a /D转换器线性阵列、两个256位8位移位寄存器、256位串行处理器和256*256位动态RAM。这似乎是一种可行的低层次视觉处理架构。处理器在20mhz的SIMD模式下工作。为了避免模拟数据的高速传输，采用比较器线性阵列形式的A/D转换器。处理部分的结构符合A/ d转换器的行并行输出。设计了一种简单而高效的处理器，能很好地适应传感器的特殊VLSI约束。当前版本的PASIC的间距为30 μ m，并且可以将A/ d转换器电路，移位寄存器，ALU和存储器装入这个狭窄的插槽中。一个关键因素是通过将内存数据总线扩展到同一列内的所有其他单元而实现的统一结构。芯片的多功能性显示使用三种算法:边缘检测，阴影校正和基于直方图的阈值。每次执行大约需要10毫秒。

引用次数: 4

Mapping high-dimension wavefront computations to silicon 将高维波前计算映射到硅

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145445

Chen-Mie Wu, R. Owens, M. J. Irwin

The authors present a new template-matching algorithm with good recognition performance. However, this new algorithm exhibits a complex, four-dimensional, wavefront architecture. Thus, for VLSI implementation, reduced architectures with fewer connections and processors need to be derived. For this purpose, the authors develop a systematic reduction methodology to manually map wavefront computations from high-dimension to low-dimension. This methodology consists of seven steps. Based on this methodology, the authors derive several two-dimensional architectures which are suitable for VLSI implementation for the new template-matching algorithm and have simulated one of the architectures by using the Intel Hypercube Machine iPSC/2.<>

提出了一种具有良好识别性能的模板匹配算法。然而，这种新算法呈现出复杂的四维波前结构。因此，对于VLSI实现，需要派生出具有更少连接和处理器的精简架构。为此，作者开发了一种系统的约简方法，手动将波前计算从高维映射到低维。这个方法包括七个步骤。在此基础上，作者推导了几种适用于新型模板匹配算法的二维架构，并利用Intel Hypercube Machine iPSC/2对其中一种架构进行了仿真。

引用次数: 2

Application specific VLSI architectures based on De Bruijn graphs 基于De Bruijn图的特定应用VLSI架构

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145498

D. Pradhan

The author provides an overview of various key features of De Bruijn graph-based VLSI architectures. The advantages of De Bruijn architectures over such other architectures as cube and shuffle-exchange are discussed. Important differences between De Bruijn interconnects and others are also described. The evolution of the De Bruijn interconnect is described. The FFT architecture and the Viterbi decoder for convolutional codes are examined in detail. The issues of routing and fault tolerance are addressed.<>

作者概述了基于De Bruijn图的VLSI架构的各种关键特性。讨论了德布鲁因体系结构相对于立方体和洗牌交换等其他体系结构的优点。De Bruijn互连和其他互连之间的重要区别也被描述。描述了德布鲁因互连的演变过程。详细研究了卷积码的FFT结构和Viterbi解码器。解决了路由和容错问题。

引用次数: 0

The RAP: a ring array processor for layered network calculations RAP:用于分层网络计算的环形阵列处理器

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145467

N. Morgan, J. Beck, P. Kohn, J. Bilmes, E. Allman, J. Beer

The authors have designed and implemented a ring array processor, RAP, for fast implementation of layered neural network algorithms. The RAP is a multi-DSP system targeted at continuous speech recognition using connectionist algorithms. Four boards, each with four Texas Instruments, TMS 320C30 DSPs, serve as an array processor for a 68020-based host running a real-time operating system. The overall system is controlled from a Sun workstation via the Ethernet. Each board includes 16 MB of dynamic memory (expandable to 64 MB) and 1 MB of fast static RAM. Theoretical peak performance is 128 MFLOPS/board, and test runs with the first working board show a sustained throughput of roughly one-third to one-half of this for algorithms of interest. Software development is aided by a Sun workstation-based command interpreter, tools from the standard C environment and a library of matrix and vector routines.<>

作者设计并实现了一种环形阵列处理器RAP，用于快速实现分层神经网络算法。RAP是一个多dsp系统，目标是使用连接算法进行连续语音识别。四块电路板，每个都有四个德州仪器的TMS 320C30 dsp，作为运行实时操作系统的基于68020的主机的阵列处理器。整个系统由Sun工作站通过以太网控制。每块板包括16mb动态内存(可扩展到64mb)和1mb快速静态RAM。理论峰值性能为128 MFLOPS/board，在第一个工作板上运行的测试显示，对于感兴趣的算法，持续吞吐量大约是这个值的三分之一到二分之一。软件开发由基于Sun工作站的命令解释器、标准C环境中的工具以及矩阵和向量例程库辅助

引用次数: 39

ASP modules: building-blocks for application-specific massively parallel processors ASP模块:用于特定于应用程序的大规模并行处理器的构建块

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145485

R. Lea

ASP (associative string processor) modules comprise highly-versatile parallel processing building-blocks for the simple construction of application-specific second-generation massively parallel processors (MPPs). The author discusses ASP module philosophy, demonstrates how ASP modules can satisfy the market, algorithmic, architectural, and engineering requirements of application-specific MPPs, and reports on current progress in the development of ASP technology. A case example indicates that 1 TOPS/ft/sup 3/, 1 GOPS/W, and 1 MOPS/$ can be reasonably forecast figures-of-merit for the cost effectiveness of second-generation MPPs built with WSI ASP modules. Comparison with first-generation MPP implementations reveals a 2-3 orders-of-magnitude advantage in favor of the ASP modules.<>

ASP(关联字符串处理器)模块包含高度通用的并行处理构建块，用于简单构建特定于应用程序的第二代大规模并行处理器(mpp)。作者讨论了ASP模块的理念，演示了ASP模块如何满足特定于应用程序的mpp的市场、算法、体系结构和工程需求，并报告了ASP技术发展的当前进展。实例表明，1 TOPS/ft/sup 3/， 1 GOPS/W和1 MOPS/$可以合理地预测使用WSI ASP模块构建的第二代mpp的成本效益。与第一代MPP实现相比，ASP模块具有2-3个数量级的优势。

引用次数: 4

Implementation of ANN on RISC processor array 人工神经网络在RISC处理器阵列上的实现

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145502

A. Hiraiwa, M. Fujita, S. Kurosu, S. Arisawa, M. Inoue

The authors present a mesh systolic array, GCN (giga connection), for a fast simulator of artificial neural networks (ANNs). The processor element (PE) of the GCN is composed of the RISC processor i-860 designed by Intel Corp., a large scale local memory, and high bandwidth first-in first-out devices. The mapping algorithm of the ANN onto the GCN, called the net-data partition, is discussed, and the multilayer feedforward network and Kohenen feature map are mapped onto the GCN by using this algorithm. Another parallelism that can be used for a stochastic ANN like the Boltzmann machine is also discussed. The performance of the GCN is evaluated by software simulation and the authors achieve over 1 gigaconnection per second using 128 PEs.<>

作者提出了一种网格收缩阵列，GCN(千兆连接)，用于人工神经网络(ann)的快速模拟器。GCN的处理器单元(PE)由Intel公司设计的RISC处理器i-860、大规模本地存储器和高带宽先进先出器件组成。讨论了神经网络到GCN的映射算法，即网络-数据分区，并利用该算法将多层前馈网络和Kohenen特征映射映射到GCN上。另一种可用于随机人工神经网络的并行性，如玻尔兹曼机，也进行了讨论。通过软件仿真对GCN的性能进行了评估，作者使用128 pe .>实现了每秒超过1千兆的连接

引用次数: 6

Systolic architectures for decoding Reed-Solomon codes 用于解码里德-所罗门码的系统架构

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145444

J. Nelson, Arifur Rahman, E. McQuade

A systolic implementation of a Reed-Solomon decoder is presented which with minor modification is suitable for BCH and Goppa codes. The various operations involved in decoding such codes were analyzed and the results are described. Systolic array architectures are derived for the various steps including the syndrome calculation, key equation solution and error evaluation. Since the throughput of the decoder is effectively determined by the speed of the multipliers, various multiplier architectures are discussed briefly. The architectures presented improve upon previous designs. The result is highly regular and modular, and thus it is more suitable for VLSI implementation.<>

本文介绍了一种里德-所罗门解码器的系统实现方法，稍加修改后即可用于 BCH 和 Goppa 编码。对解码这类编码所涉及的各种操作进行了分析，并对结果进行了描述。为包括综合征计算、键方程求解和误差评估在内的各个步骤推导出了收缩阵列架构。由于解码器的吞吐量实际上取决于乘法器的速度，因此简要讨论了各种乘法器架构。所介绍的架构改进了以前的设计。其结果是高度规则化和模块化的，因此更适合于 VLSI 实现。

引用次数: 1

Massively parallel architecture: application to neural net emulation and image reconstruction 大规模并行架构:在神经网络仿真和图像重建中的应用

[1990] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1990-09-05 DOI: 10.1109/ASAP.1990.145458

D. Lattard, B. Faure, G. Mazaré

The authors present two applications of a specific cellular architecture: emulation of the recall and learning for feedforward neural networks and parallel image reconstruction. This architecture is based on a bidimensional array of asynchronous processing elements, the cells, which can communicate between themselves by message transfers. Each cell includes a rotating routing part ensuring the message transportation through the array and a processing part dedicated to a particular application. The specificity of the processing part demands that it be redesigned for each application but leads to very fast computing and low complexity. This architecture can process algorithms not regular enough for SIMD machines. The cellular architecture is described, the feedforward neural network accelerator is introduced, the learning is discussed, and some time performances, evaluated by computer simulation, are given. The image reconstruction problem, its parallelization, some results of both functional and behavioral simulations, the realization of the circuit, and some test results are presented.<>

作者提出了一种特定细胞结构的两种应用:前馈神经网络的回忆和学习仿真以及并行图像重建。该体系结构基于异步处理元素(单元)的二维数组，单元之间可以通过消息传输进行通信。每个单元包括确保消息通过阵列传输的旋转路由部分和专用于特定应用程序的处理部分。处理部分的特殊性要求为每个应用程序重新设计，但导致非常快速的计算和低复杂性。这种体系结构可以处理对于SIMD机器来说不够规则的算法。描述了细胞结构，引入了前馈神经网络加速器，讨论了学习问题，并给出了一些时间性能，并用计算机仿真对其进行了评价。给出了图像重构问题及其并行化、一些功能和行为的仿真结果、电路的实现和一些测试结果。

{"title":"Massively parallel architecture: application to neural net emulation and image reconstruction","authors":"D. Lattard, B. Faure, G. Mazaré","doi":"10.1109/ASAP.1990.145458","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145458","url":null,"abstract":"The authors present two applications of a specific cellular architecture: emulation of the recall and learning for feedforward neural networks and parallel image reconstruction. This architecture is based on a bidimensional array of asynchronous processing elements, the cells, which can communicate between themselves by message transfers. Each cell includes a rotating routing part ensuring the message transportation through the array and a processing part dedicated to a particular application. The specificity of the processing part demands that it be redesigned for each application but leads to very fast computing and low complexity. This architecture can process algorithms not regular enough for SIMD machines. The cellular architecture is described, the feedforward neural network accelerator is introduced, the learning is discussed, and some time performances, evaluated by computer simulation, are given. The image reconstruction problem, its parallelization, some results of both functional and behavioral simulations, the realization of the circuit, and some test results are presented.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"386 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125247579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

[1990] Proceedings of the International Conference on Application Specific Array Processors

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀