[1992] Proceedings of the International Conference on Application Specific Array Processors最新文献

英文中文

A systolic array chip for robot inverse dynamics computation 一种用于机器人逆动力学计算的收缩阵列芯片

[1992] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1992-08-04 DOI: 10.1109/ASAP.1992.218556

Mehdi Rahman, D. Meyer

To ensure smooth and accurate movement of a robot arm, the robot inverse dynamics problem must be solved at each servo sampling. The computation of this problem, however, is a mathematically intense task which degrades the sampling period of presentday robot control systems. In addition to the repetitive requirement for its evaluation, the linearly recursive and computer-bound properties of the robot inverse dynamics problem using the Newton-Euler (N-E) equations of motion suggest that it is amenable for direct mapping onto a fixed systolic array structure. This paper presents such an architecture and discusses its implementation in 1-micron CMOS technology, to compute the N-E algorithm for an n-link manipulator, within a period of 69+12n clock cycles. For a six-link robot manipulator operating at the maximum device frequency of 25 MHz, the total execution time is 5.64 mu s. The die size of this robot controller chip is 530*485 square mils, and its estimated power dissipation at the specified frequency is 3.5 watts.<>

为了保证机器人手臂运动的平稳、准确，每次伺服采样时都必须求解机器人的逆动力学问题。然而，这一问题的计算是一项数学上的艰巨任务，它降低了现有机器人控制系统的采样周期。除了对其评估的重复性要求外，使用牛顿-欧拉(N-E)运动方程的机器人逆动力学问题的线性递归和计算机约束性质表明，它可以直接映射到固定的收缩阵列结构上。本文提出了这种结构，并讨论了其在1微米CMOS技术上的实现，以在69+12n时钟周期内计算N-E算法。对于工作在最大设备频率为25 MHz的六连杆机器人机械手，总执行时间为5.64 μ s。该机器人控制器芯片的芯片尺寸为530*485平方密尔，其在指定频率下的估计功耗为3.5瓦。

引用次数: 0

Efficient scheduling methods for partitioned systolic algorithms 分区收缩算法的高效调度方法

[1992] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1992-08-04 DOI: 10.1109/ASAP.1992.218538

Prashanth Kuchibhotla, B. Rao

Various methods for mapping signal processing algorithms into systolic arrays have been developed in the past few years. In this paper, efficient scheduling techniques are developed for the partitioning problem, i.e. problems with size that do not match the array size. In particular, scheduling for the locally parallel-globally sequential (LPGS) technique and the locally sequential-globally parallel (LSGP) technique are developed. The scheduling procedure developed exploits the fact that after LPGS and LSGP partitioning, the locality constraints become modified allowing for more flexibility. The new structure allows the authors to develop a flexible scheduling order for LPGS that is useful in evaluating a trade-off between execution time and size of partitioning buffers. The benefits of the scheduling techniques are illustrated with the help of matrix multiplication and least-squares examples.<>

在过去的几年中，已经开发了各种将信号处理算法映射到收缩阵列的方法。本文针对分区问题，即大小与阵列大小不匹配的问题，开发了高效的调度技术。重点研究了局部并行-全局顺序(LPGS)调度和局部顺序-全局并行(LSGP)调度。所开发的调度过程利用了这样一个事实，即在LPGS和LSGP分区之后，会修改局域约束，从而获得更大的灵活性。新的结构允许作者为LPGS开发一个灵活的调度顺序，这对于评估执行时间和分区缓冲区大小之间的权衡很有用。通过矩阵乘法和最小二乘示例说明了调度技术的好处

引用次数: 6

Some low power implementations of DSP algorithms 一些低功耗DSP算法的实现

[1992] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1992-08-04 DOI: 10.1109/ASAP.1992.218566

Joseph B. Evans, Bede Liu

The implementation of digital signal processing algorithms often requires that a variety of conflicting criteria be satisfied. A signal processing system must provide the necessary processing gains, while various measures of the feasibility and efficiency of implementation, such as power and cost, are met. This paper reviews the motivation behind the development of low power signal processing algorithms, presents some methods for addressing these problems, and gives several examples of reduced complexity signal processing implementations.<>

数字信号处理算法的实现往往需要满足各种相互冲突的准则。信号处理系统必须提供必要的处理增益，同时满足实现的可行性和效率的各种指标，如功率和成本。本文回顾了低功耗信号处理算法发展背后的动机，提出了一些解决这些问题的方法，并给出了几个降低复杂性的信号处理实现的例子

引用次数: 1

Architecture and realization of a multi signal processor system 多信号处理器系统的结构与实现

[1992] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1992-08-04 DOI: 10.1109/ASAP.1992.218561

A. Gunzinger, U. A. Müller, W. Scott, B. Bäumle, P. Kohler, W. Guggenbühl

This paper describes a parallel distributed computer architecture called MUSIC (multi signal processor system with intelligent communication). A single processor element (PE) consists of a DSP 96002 from Motorola (60 MFlops), program and data memory and a fast, independent communication interface; all communication interfaces are connected through a communication ring. A system with 30 processor elements (PEs) is operational. It has a peak performance of 1.8 GFlops, an electrical power consumption of about 350 watt (including forced air cooling). It fits into a 19 inch rack. The hardware price of this system is 40000 US $ which means a selling price of approximately 200000 US $. Beside the wellknown Mandelbrot algorithm (601 MFlops sustained), two real applications are at the moment successfully implemented on the system: the backpropagation algorithm for neural net learning results in a peak performance of 150 MCUPS (million connection updates per second) which equals 900 MFlops sustained and the molecular dynamics simulation program MD-Atom (443 MFlops sustained). Other applications of the system are in digital signal processing and finite element computation.<>

本文介绍了一种并行分布式计算机体系结构MUSIC (multi signal processor system with intelligent communication)。单处理器单元(PE)由摩托罗拉的DSP 96002 (60 MFlops)、程序和数据存储器以及一个快速、独立的通信接口组成;所有通信接口通过通信环连接。一个有30个处理器元素(pe)的系统是可操作的。它的峰值性能为1.8 GFlops，电力消耗约为350瓦(包括强制空气冷却)。它可以放在一个19英寸的架子上。该系统的硬件价格为40000美元，这意味着售价约为200000美元。除了著名的Mandelbrot算法(持续601 MFlops)，目前在系统上成功实现了两个实际应用:用于神经网络学习的反向传播算法的峰值性能为150 MCUPS(每秒百万次连接更新)，相当于持续900 MFlops，以及分子动力学模拟程序MD-Atom(持续443 MFlops)。该系统的其他应用还包括数字信号处理和有限元计算

{"title":"Architecture and realization of a multi signal processor system","authors":"A. Gunzinger, U. A. Müller, W. Scott, B. Bäumle, P. Kohler, W. Guggenbühl","doi":"10.1109/ASAP.1992.218561","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218561","url":null,"abstract":"This paper describes a parallel distributed computer architecture called MUSIC (multi signal processor system with intelligent communication). A single processor element (PE) consists of a DSP 96002 from Motorola (60 MFlops), program and data memory and a fast, independent communication interface; all communication interfaces are connected through a communication ring. A system with 30 processor elements (PEs) is operational. It has a peak performance of 1.8 GFlops, an electrical power consumption of about 350 watt (including forced air cooling). It fits into a 19 inch rack. The hardware price of this system is 40000 US $ which means a selling price of approximately 200000 US $. Beside the wellknown Mandelbrot algorithm (601 MFlops sustained), two real applications are at the moment successfully implemented on the system: the backpropagation algorithm for neural net learning results in a peak performance of 150 MCUPS (million connection updates per second) which equals 900 MFlops sustained and the molecular dynamics simulation program MD-Atom (443 MFlops sustained). Other applications of the system are in digital signal processing and finite element computation.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132966292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Linear scheduling is close to optimality 线性调度接近最优

[1992] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1992-08-04 DOI: 10.1109/ASAP.1992.218583

A. Darte, L. Khachiyan, Y. Robert

This paper deals with the problem of finding optimal schedulings for uniform dependence algorithms. Given a convex domain, let T/sub f/ be the total time needed to execute all computations using the free (greedy) schedule and let T/sub l/ be the total time needed to execute all computations using the optimal linear schedule. The authors' main result is to bound T/sub l//T/sub f/ and T/sub l/-T/sub f/ for sufficiently 'fat' domains.<>

研究一致依赖算法的最优调度问题。给定一个凸域，设T/ f/为使用自由(贪婪)调度执行所有计算所需的总时间，设T/ l/为使用最优线性调度执行所有计算所需的总时间。作者的主要结果是将T/下标l//T/下标f/和T/下标l/-T/下标f/结合为足够“胖”的结构域。

引用次数: 37

Determining longest common subsequences of two sequences on a linear array of processors 确定处理器线性阵列上两个序列的最长公共子序列

[1992] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1992-08-04 DOI: 10.1109/ASAP.1992.218547

A. Mukherjee

This paper presents special-purpose linear array processor architecture for determining longest common subsequences (LCS) of two sequences. The algorithm uses systolic and pipelined architectures suitable for VLSI implementation. The algorithms are also suitable for implementation on parallel machines. The author first develops a 'greedy' algorithm to determine some of the LCS and then proposes a generalization to determine all LCS of the given pair of sequences. Earlier hardware algorithms were concerned with determining only the length of LCS or the edit distance of two sequences.<>

本文提出了用于确定两个序列的最长公共子序列的专用线性阵列处理器结构。该算法采用适合VLSI实现的收缩和流水线架构。这些算法也适合在并行机器上实现。首先提出了一种贪心算法来确定某些LCS，然后提出了一种泛化算法来确定给定序列对的所有LCS。早期的硬件算法只关心确定LCS的长度或两个序列的编辑距离。

引用次数: 1

A reconfigurable processor array with routing LSIs and general purpose DSPs 具有路由lsi和通用dsp的可重构处理器阵列

[1992] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1992-08-04 DOI: 10.1109/ASAP.1992.218578

J. Levison, I. Kuroda, T. Nishitani

A building block for a scalable signal processor array is developed with a general-purpose DSP and a message routing LSI. Each DSP can be connected by multiple routing LSIs forming a point-to-point message-passing network with data packet communication. Low network latency is obtained by cut-through routing technique with sufficient communication bandwidth. The employment of an on-chip routing table allows regular as well as irregular topologies with complex routing techniques such as broad/multi-casting and dynamic routing. The combination of DSPs ( mu PD77240), a flexible message-passing network and an optional application-specific I/O interface makes the processor array suitable for a wide range of high speed signal processing applications such as adaptive array processing and 3-D vision processing.<>

一个可扩展的信号处理器阵列的构建块，开发了一个通用的DSP和一个消息路由LSI。每个DSP可以通过多个路由lsi连接起来，形成一个点对点的数据包通信消息传递网络。采用直通路由技术，获得了较低的网络时延和足够的通信带宽。片上路由表的使用允许规则和不规则的拓扑结构与复杂的路由技术，如广泛/多播和动态路由。dsp (mu PD77240)，灵活的消息传递网络和可选的特定应用I/O接口的组合使处理器阵列适用于各种高速信号处理应用，如自适应阵列处理和3-D视觉处理。

引用次数: 7

On systolic mapping of multi-stage algorithms 关于多阶段算法的收缩映射

[1992] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1992-08-04 DOI: 10.1109/ASAP.1992.218582

Y. Hwang, Y. Hu

The authors present a more general mapping problem called multi-stage systolic mapping which focuses on the computing algorithms containing more than one nested loop constructs to be executed sequentially. Since the emerged interface problem now becomes the dominant factor in performing the mapping, the authors argue that the adjacent stages should have matched interface to reduce the overhead. For this, the conditions of interface matching between two stage's mappings are established. A systematic method to derive the interface matched mapping is also presented. To improve the performance degradation due to the initial and final phases of computation in systolic computing, the inter-stage computation concurrency is explored by overlapping part of the computations in successive stages and thus effectively reduces the computation latency. With these results, the multi-stage systolic mapping tool (MSSM) is developed and several design examples are presented to illustrate the potential use of MSSM.<>

作者提出了一种更通用的映射问题，称为多阶段收缩映射，它关注的是包含多个嵌套循环结构的计算算法，这些循环结构依次执行。由于出现的接口问题现在成为执行映射的主要因素，作者认为相邻阶段应该匹配接口以减少开销。为此，建立了两阶段映射之间的接口匹配条件。给出了一种系统的接口匹配映射导出方法。为了改善收缩计算中由于计算的初始阶段和最终阶段导致的性能下降，通过将连续阶段的部分计算重叠来探索阶段间的计算并发性，从而有效地降低了计算延迟。根据这些结果，开发了多阶段收缩映射工具(MSSM)，并给出了几个设计实例来说明MSSM的潜在用途。

引用次数: 10

Implementing a family of high performance, micrograined architectures 实现一系列高性能、微粒度的体系结构

[1992] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1992-08-04 DOI: 10.1109/ASAP.1992.218572

R. Owens, M. J. Irwin, T. Kelliher, M. Vishwanath, R. Bajwa

This paper describes the design and implementation of high performance micrograined architectures. These architectures are capable of teraops performance. Each architecture is organized as a systolic array of processors. A prototyping system for the architectures is proposed. The prototyping system provides control, I/O, and an interface to a host system for each of the micro-grained architectures. The prototyping system has been designed with flexibility in mind to support a wide variety of these micro-grained architectures. Beyond the research outlined, the authors anticipate using the prototyping system as a 'test-bed' for various class/student VLSI design projects within the department. Three micro-grained architectures are described: an associative memory-based architecture, a Mux-based architecture and a RAM-based architecture. These architectures are useful for solving a number of important problems, such as: edge detection, locating connected components, two-dimensional signal and image processing, sorting elements, and performing element permutations.<>

本文描述了高性能微粒度体系结构的设计和实现。这些体系结构能够实现teraops性能。每个体系结构都被组织为处理器的收缩数组。提出了该体系结构的原型系统。原型系统为每个微粒度体系结构提供控制、I/O和到主机系统的接口。原型系统在设计时考虑了灵活性，以支持各种各样的微粒度架构。除了研究概述之外，作者预计将原型系统用作部门内各种班级/学生VLSI设计项目的“试验台”。本文描述了三种微粒度体系结构:基于关联内存的体系结构、基于mux的体系结构和基于ram的体系结构。这些架构对于解决许多重要问题很有用，例如:边缘检测、定位连接组件、二维信号和图像处理、元素排序以及执行元素排列。

引用次数: 15

Interval-related problems on reconfigurable meshes 可重构网格的区间相关问题

[1992] Proceedings of the International Conference on Application Specific Array Processors

Pub Date : 1992-08-04 DOI: 10.1109/ASAP.1992.218553

S. Olariu, J. L. Schwing, Jingyuan Zhang

Interval graphs provide a natural model for a vast number of scheduling and VLSI problems. A variety of interval graph problems have been solved on the PRAM family. Recently, a powerful architecture called the reconfigurable mesh has been proposed: in essence, a reconfigurable mesh consists of a mesh-connected architecture augmented by a dynamically reconfigurable bus system. It has been argued that the regular structure of the reconfigurable mesh is suitable for VLSI implementation. The authors develop a set of tools and show how they can be used to devise constant time algorithms to solve a number of interval-related problem on reconfigurable meshes. These problems include finding a maximum independent set, a minimum clique cover, a minimum dominating set, a minimum coloring, along with algorithms to compute the shortest path between a pair of intervals and, based on the shortest path, an algorithm to find the center of an interval graph. More precisely, with an arbitrary family of n intervals as input, all their algorithms run in constant time on a reconfigurable mesh of size n*n.<>

区间图为大量的调度和超大规模集成电路问题提供了一个自然的模型。在PRAM族上求解了各种区间图问题。近年来，人们提出了一种功能强大的可重构网格体系结构:可重构网格本质上是由网格连接体系结构和动态可重构总线系统组成的。本文认为，可重构网格的规则结构适合大规模集成电路的实现。作者开发了一套工具，并展示了如何使用它们来设计常数时间算法来解决可重构网格上的一些与间隔相关的问题。这些问题包括寻找最大独立集、最小团盖、最小支配集、最小着色，以及计算一对区间之间最短路径的算法，以及基于最短路径的寻找区间图中心的算法。更准确地说，以任意的n个区间作为输入，他们所有的算法在一个大小为n*n的可重构网格上以恒定的时间运行。

引用次数: 2

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

[1992] Proceedings of the International Conference on Application Specific Array Processors

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀