首页 > 最新文献

[1992] Proceedings of the International Conference on Application Specific Array Processors最新文献

英文 中文
Optimal design of lower dimensional processor arrays for uniform recurrences 均匀递归的低维处理器阵列优化设计
K. Ganapathy, B. Wah
The authors present a parameter-based approach for synthesizing systolic architectures from uniform recurrence equations. The scheme presented is a generalization of the parameter method proposed by G.J. Li and B.W. Wah (1985). The approach synthesizes optimal arrays of any lower dimension from a general uniform recurrence description of the problem. In other previous attempts for mapping uniform recurrences into lower-dimensional arrays, optimality of the resulting designs is not guaranteed. As an illustration of the technique, optimal linear arrays for matrix multiplication are given. A detailed design for solving path-finding problems is also presented.<>
作者提出了一种基于参数的方法从一致递归方程合成收缩结构。本文提出的方案是对李国杰和华宝文(1985)提出的参数法的推广。该方法从问题的一般一致递归描述中综合出任意低维的最优数组。在以前将均匀递归映射到低维数组的其他尝试中,不能保证最终设计的最优性。作为该技术的一个例子,给出了矩阵乘法的最优线性阵列。给出了求解寻路问题的详细设计。
{"title":"Optimal design of lower dimensional processor arrays for uniform recurrences","authors":"K. Ganapathy, B. Wah","doi":"10.1109/ASAP.1992.218539","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218539","url":null,"abstract":"The authors present a parameter-based approach for synthesizing systolic architectures from uniform recurrence equations. The scheme presented is a generalization of the parameter method proposed by G.J. Li and B.W. Wah (1985). The approach synthesizes optimal arrays of any lower dimension from a general uniform recurrence description of the problem. In other previous attempts for mapping uniform recurrences into lower-dimensional arrays, optimality of the resulting designs is not guaranteed. As an illustration of the technique, optimal linear arrays for matrix multiplication are given. A detailed design for solving path-finding problems is also presented.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114925600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Application and packaging of the AT&T DSP3 parallel signal processor AT&T DSP3并行信号处理器的应用与封装
R. Shively, L. J. Wu
Achieving the potential performance of highly parallel MIMD processor architectures is critically dependent on both the speed and routing capabilities of the network fabric. The routing network of the AT&T DSP3 processor is described together with an indication of how the 40 megabyte/s links can be configured to meet diverse application requirements. Scaling to very large configurations is aided by compact packaging. Silicon-on-silicon multi-chip modules together with a novel three-dimensional vertical interconnection technology are being used to repackage the DSP3 into the ultra-dense processor.<>
实现高度并行MIMD处理器架构的潜在性能主要取决于网络结构的速度和路由能力。描述了AT&T DSP3处理器的路由网络,并说明了如何配置40兆字节/秒的链路以满足各种应用需求。紧凑的封装有助于扩展到非常大的配置。硅对硅多芯片模块和一种新颖的三维垂直互连技术被用于将DSP3重新封装到超密集处理器中。
{"title":"Application and packaging of the AT&T DSP3 parallel signal processor","authors":"R. Shively, L. J. Wu","doi":"10.1109/ASAP.1992.218562","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218562","url":null,"abstract":"Achieving the potential performance of highly parallel MIMD processor architectures is critically dependent on both the speed and routing capabilities of the network fabric. The routing network of the AT&T DSP3 processor is described together with an indication of how the 40 megabyte/s links can be configured to meet diverse application requirements. Scaling to very large configurations is aided by compact packaging. Silicon-on-silicon multi-chip modules together with a novel three-dimensional vertical interconnection technology are being used to repackage the DSP3 into the ultra-dense processor.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122523456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Constant capacity signal flow signal processor architecture benchmark 恒容量信号流信号处理器架构基准
H. Habereder, R. Harrison
This paper describes the implementation and benchmark testing of a high performance signal processor architecture based on the alternate low level primitive structures (ALPS) concept developed by the Naval Research Laboratory. The research shows that such digital signal processor architectures are not only feasible but provide a modular solution to a wide range of signal processing applications. In addition the benchmark tests show that such architectures provide higher efficiency and lower data transfer network contentions than existing global memory-based data flow architectures. The processor system consists of high-performance, fully programmable, embedded signal processors and controllers networked on a set of high bandwidth busses to provide a processing capability far in excess of that offered by current systems. The modular array processor (MAP) is a networked multiprocessor with VLSI-based signal and control processing modules.<>
本文描述了一种高性能信号处理器架构的实现和基准测试,该架构基于海军研究实验室开发的交替低电平原始结构(ALPS)概念。研究表明,这种数字信号处理器架构不仅可行,而且为广泛的信号处理应用提供了模块化解决方案。此外,基准测试表明,与现有的基于全局内存的数据流架构相比,该架构提供了更高的效率和更低的数据传输网络争用。处理器系统由高性能、完全可编程的嵌入式信号处理器和控制器组成,这些处理器和控制器联网在一组高带宽总线上,以提供远远超过当前系统所提供的处理能力。模块化阵列处理器(MAP)是一种基于vlsi的信号和控制处理模块的网络化多处理器。
{"title":"Constant capacity signal flow signal processor architecture benchmark","authors":"H. Habereder, R. Harrison","doi":"10.1109/ASAP.1992.218563","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218563","url":null,"abstract":"This paper describes the implementation and benchmark testing of a high performance signal processor architecture based on the alternate low level primitive structures (ALPS) concept developed by the Naval Research Laboratory. The research shows that such digital signal processor architectures are not only feasible but provide a modular solution to a wide range of signal processing applications. In addition the benchmark tests show that such architectures provide higher efficiency and lower data transfer network contentions than existing global memory-based data flow architectures. The processor system consists of high-performance, fully programmable, embedded signal processors and controllers networked on a set of high bandwidth busses to provide a processing capability far in excess of that offered by current systems. The modular array processor (MAP) is a networked multiprocessor with VLSI-based signal and control processing modules.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127020600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High speed bit-level pipelined architectures for redundant CORDIC implementation 冗余CORDIC实现的高速位级流水线架构
H. Dawid, H. Meyr
The CORDIC algorithm is well known as an efficient method for the computation of trigonometric/hyperbolic functions and vector rotations. The achievable throughput and the latency of CORDIC processors using conventional arithmetic are determined by the carry propagation occurring in additions/subtractions, since the CORDIC iterations are directed by the signs of intermediate results. Using a redundant number system, much higher throughput is possible due to the elimination of carry propagation, but an exact sign detection can not be implemented efficiently. The authors derive transformations of the original CORDIC algorithm which result in partially fixed iteration sequences no longer dependent on intermediate signs for the CORDIC vectoring mode as well as the rotation mode. Very fast and efficient carry-save architectures using redundant absolute value computation resulting from the transformed algorithms are described. A CORDIC processor (rotation mode) is presented as an implementation example which to the best of the authors knowledge is the fastest CMOS CORDIC realization today.<>
众所周知,CORDIC算法是一种计算三角/双曲函数和矢量旋转的有效方法。使用传统算法的CORDIC处理器的可实现吞吐量和延迟是由加法/减法中的进位传播决定的,因为CORDIC迭代是由中间结果的符号指导的。使用冗余数字系统,由于消除了进位传播,可以实现更高的吞吐量,但不能有效地实现精确的符号检测。作者对原CORDIC算法进行了变换,得到了部分固定的迭代序列,不再依赖CORDIC矢量模式和旋转模式的中间符号。描述了利用转换算法产生的冗余绝对值计算的非常快速和有效的进位节省结构。一个CORDIC处理器(旋转模式)是一个实现的例子,这是目前最快的CMOS CORDIC实现的作者最好的知识。
{"title":"High speed bit-level pipelined architectures for redundant CORDIC implementation","authors":"H. Dawid, H. Meyr","doi":"10.1109/ASAP.1992.218559","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218559","url":null,"abstract":"The CORDIC algorithm is well known as an efficient method for the computation of trigonometric/hyperbolic functions and vector rotations. The achievable throughput and the latency of CORDIC processors using conventional arithmetic are determined by the carry propagation occurring in additions/subtractions, since the CORDIC iterations are directed by the signs of intermediate results. Using a redundant number system, much higher throughput is possible due to the elimination of carry propagation, but an exact sign detection can not be implemented efficiently. The authors derive transformations of the original CORDIC algorithm which result in partially fixed iteration sequences no longer dependent on intermediate signs for the CORDIC vectoring mode as well as the rotation mode. Very fast and efficient carry-save architectures using redundant absolute value computation resulting from the transformed algorithms are described. A CORDIC processor (rotation mode) is presented as an implementation example which to the best of the authors knowledge is the fastest CMOS CORDIC realization today.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115480484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
An architecture for tree search based vector quantization for single chip implementation 一种基于树搜索的矢量量化体系结构的单片机实现
Heonchul Park, V. Prasanna, Cho-Li Wang
Vector quantization (VQ) has become feasible for use in real-time applications by employing VLSI technology. The authors propose a new search algorithm and an architecture for implementing it, which can be used in real-time image processing. This search algorithm takes O(k) time units on a sequential machine, where k is the dimension of the codevectors, assuming unit time corresponds to one comparison operation. The proposed architecture employs a single processing element (PE) and O(N) external memory for storing N hyperplanes used in the search, where N is the number of codevectors. Compared with known architectures for VQ in the literature, the proposed design does not perform any multiplication operation, since the search method is independent of any L/sub q/ metric, 1>
利用VLSI技术,矢量量化(VQ)在实时应用中变得可行。作者提出了一种新的搜索算法及其实现架构,可用于实时图像处理。该搜索算法在顺序机器上花费O(k)个时间单位,其中k是编码向量的维数,假设单位时间对应于一个比较操作。所提出的体系结构使用单个处理元素(PE)和O(N)外部存储器来存储搜索中使用的N个超平面,其中N是编码向量的数量。与文献中已知的VQ架构相比,该设计不执行任何乘法运算,因为搜索方法与任何L/sub q/ metric无关,1>
{"title":"An architecture for tree search based vector quantization for single chip implementation","authors":"Heonchul Park, V. Prasanna, Cho-Li Wang","doi":"10.1109/ASAP.1992.218557","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218557","url":null,"abstract":"Vector quantization (VQ) has become feasible for use in real-time applications by employing VLSI technology. The authors propose a new search algorithm and an architecture for implementing it, which can be used in real-time image processing. This search algorithm takes O(k) time units on a sequential machine, where k is the dimension of the codevectors, assuming unit time corresponds to one comparison operation. The proposed architecture employs a single processing element (PE) and O(N) external memory for storing N hyperplanes used in the search, where N is the number of codevectors. Compared with known architectures for VQ in the literature, the proposed design does not perform any multiplication operation, since the search method is independent of any L/sub q/ metric, 1<or=q<or= infinity . It leads to an area efficient design with the PE consisting of a comparator and O(k) registers. Also, the memory used by the design is significantly less than those employed in the known architectures.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114095595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A method to synthesize modular systolic arrays with local broadcast facility 一种利用局部广播设备合成模块化收缩阵列的方法
T. Risset
The author proposes a method to synthesize modular systolic arrays with local broadcast facility (i.e. arrays containing wires of length lower than a fixed -technology dependent- constant). The synthesis is made from a dependence graph which is not uniform but 'locally broadcast'. This method aims at generalizing isolated results that have been recently reported on the acceleration of systolic algorithms by using extensions of the 'pure' systolic model (wire of length>1, wrap around, folding arrays, etc).<>
作者提出了一种具有局部广播设施(即包含长度小于固定技术相关常数的导线的阵列)的模块化收缩阵列的合成方法。合成是由一个不均匀但“局部广播”的依赖图组成的。该方法旨在推广最近报道的关于收缩算法加速的孤立结果,这些结果是通过使用“纯”收缩模型的扩展(长度>1的导线,环绕,折叠数组等)。
{"title":"A method to synthesize modular systolic arrays with local broadcast facility","authors":"T. Risset","doi":"10.1109/ASAP.1992.218555","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218555","url":null,"abstract":"The author proposes a method to synthesize modular systolic arrays with local broadcast facility (i.e. arrays containing wires of length lower than a fixed -technology dependent- constant). The synthesis is made from a dependence graph which is not uniform but 'locally broadcast'. This method aims at generalizing isolated results that have been recently reported on the acceleration of systolic algorithms by using extensions of the 'pure' systolic model (wire of length>1, wrap around, folding arrays, etc).<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121062181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Mapping locally recursive SEGs upon a multiprocessor system in a ring network 在环形网络中的多处理器系统上映射局部递归seg
Wonyong Sung, S. Mitra, Ki-II Kum
A multiprocessor code generation method for digital signal processing algorithms represented by SFGs (signal flow graphs) is developed. For reducing the number of communication operations as well as distributing the workload evenly among the processors, a multiprocessor scheduling method based on a parallel block processing scheme, which processes multiple blocks of input data concurrently, is employed. The developed method first divides an SFG into graph segments to reduce the dependency time. A segment merging process is followed, which results less number of temporary data storages and data transfers. A multiprocessor code is generated by applying a single processor code generation method to each of these segments. The implementation result for QR-RLS algorithm using the developed method is included.<>
提出了一种以信号流图为代表的数字信号处理算法的多处理器代码生成方法。为了减少通信操作的数量,并在处理器之间均匀分配工作负载,采用了一种基于并行块处理方案的多处理器调度方法,该方法对多个输入数据块进行并发处理。该方法首先将SFG划分为图形段,以减少依赖时间。采用段合并过程,减少了临时数据存储和数据传输。通过将单个处理器代码生成方法应用于这些段中的每一个,可以生成多处理器代码。本文给出了基于该方法的QR-RLS算法的实现结果
{"title":"Mapping locally recursive SEGs upon a multiprocessor system in a ring network","authors":"Wonyong Sung, S. Mitra, Ki-II Kum","doi":"10.1109/ASAP.1992.218544","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218544","url":null,"abstract":"A multiprocessor code generation method for digital signal processing algorithms represented by SFGs (signal flow graphs) is developed. For reducing the number of communication operations as well as distributing the workload evenly among the processors, a multiprocessor scheduling method based on a parallel block processing scheme, which processes multiple blocks of input data concurrently, is employed. The developed method first divides an SFG into graph segments to reduce the dependency time. A segment merging process is followed, which results less number of temporary data storages and data transfers. A multiprocessor code is generated by applying a single processor code generation method to each of these segments. The implementation result for QR-RLS algorithm using the developed method is included.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128506225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A projective geometry architecture for scientific computation 一种用于科学计算的射影几何结构
B. Amrutur, Rajeev Joshi, N. Karmarkar
A large fraction of scientific and engineering computations involve sparse matrices. While dense matrix computations can be parallelized relatively easily, sparse matrices with arbitrary or irregular structure pose a real challenge to designers of highly parallel machines. A recent paper by N.K. Karmarkar (1991) proposed a new parallel architecture for sparse matrix computations based on finite projective geometries. Mathematical structure of these geometries plays an important role in defining the interconnections between the processors and memories in this architecture, and also aids in efficiently solving several difficult problems (such as load balancing, data-routing, memory-access conflicts, etc.) that are encountered in the design of parallel systems. The authors discuss some of the key issues in the system design of such a machine, and show how exploiting the structure of the geometry results in an efficient hardware implementation of the machine. They also present circuit designs and simulation results for key elements of the system: a 200 MHz pipelined memory; a pipelined multiplier based on an adder unit with a delay of 2 ns; and a 500 Mbit/s CMOS input/output buffer.<>
科学和工程计算的很大一部分涉及稀疏矩阵。虽然密集矩阵计算可以相对容易地并行化,但具有任意或不规则结构的稀疏矩阵对高度并行机器的设计者提出了真正的挑战。N.K. Karmarkar(1991)最近的一篇论文提出了一种基于有限射影几何的稀疏矩阵计算的新的并行架构。这些几何图形的数学结构在定义该体系结构中处理器和存储器之间的互连方面起着重要作用,也有助于有效地解决并行系统设计中遇到的一些难题(如负载平衡、数据路由、内存访问冲突等)。作者讨论了这种机器系统设计中的一些关键问题,并展示了如何利用几何结构来实现机器的高效硬件实现。他们还介绍了系统关键元件的电路设计和仿真结果:200mhz流水线存储器;基于延迟为2 ns的加法器单元的流水线乘法器;500mbit /s CMOS输入/输出缓冲器
{"title":"A projective geometry architecture for scientific computation","authors":"B. Amrutur, Rajeev Joshi, N. Karmarkar","doi":"10.1109/ASAP.1992.218581","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218581","url":null,"abstract":"A large fraction of scientific and engineering computations involve sparse matrices. While dense matrix computations can be parallelized relatively easily, sparse matrices with arbitrary or irregular structure pose a real challenge to designers of highly parallel machines. A recent paper by N.K. Karmarkar (1991) proposed a new parallel architecture for sparse matrix computations based on finite projective geometries. Mathematical structure of these geometries plays an important role in defining the interconnections between the processors and memories in this architecture, and also aids in efficiently solving several difficult problems (such as load balancing, data-routing, memory-access conflicts, etc.) that are encountered in the design of parallel systems. The authors discuss some of the key issues in the system design of such a machine, and show how exploiting the structure of the geometry results in an efficient hardware implementation of the machine. They also present circuit designs and simulation results for key elements of the system: a 200 MHz pipelined memory; a pipelined multiplier based on an adder unit with a delay of 2 ns; and a 500 Mbit/s CMOS input/output buffer.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117013432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A parallel sorting algorithm on an eight-neighbor processor array 基于八相邻处理器阵列的并行排序算法
K. Tanno, T. Takeda, Susumu Horoguchi
The authors deal with a new parallel sorting algorithm on an eight-neighbor processor array with wraparounds in the rows. The algorithm is very simple because it is composed of the iteration of only a primitive operation, comparing and exchanging four elements simultaneously. Each processor (processing element), arranged in a two-dimensional array can communicate with 8 neighbouring processors (if they exist). By fully making use of its communication capability and wraparounds properties, the algorithm sorts n*n elements in the row-major order, and yields the sorting time of 3(n+1)(2t/sub r/+t/sub c/), where t/sub r/ and t/sub c/ are defined as the times for a unit routing step and a comparison processing, respectively.<>
作者处理了一种新的并行排序算法,该算法在行中有环绕的八相邻处理器阵列上。该算法非常简单,因为它只由一个基本操作的迭代组成,同时比较和交换四个元素。每个处理器(处理单元),排列在一个二维数组中,可以与8个相邻的处理器(如果存在的话)通信。该算法充分利用其通信能力和wraround特性,按行主序对n*n个元素进行排序,得到排序时间为3(n+1)(2t/sub r/+t/sub c/),其中t/sub r/和t/sub c/分别定义为单位路由步骤和比较处理的时间。
{"title":"A parallel sorting algorithm on an eight-neighbor processor array","authors":"K. Tanno, T. Takeda, Susumu Horoguchi","doi":"10.1109/ASAP.1992.218552","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218552","url":null,"abstract":"The authors deal with a new parallel sorting algorithm on an eight-neighbor processor array with wraparounds in the rows. The algorithm is very simple because it is composed of the iteration of only a primitive operation, comparing and exchanging four elements simultaneously. Each processor (processing element), arranged in a two-dimensional array can communicate with 8 neighbouring processors (if they exist). By fully making use of its communication capability and wraparounds properties, the algorithm sorts n*n elements in the row-major order, and yields the sorting time of 3(n+1)(2t/sub r/+t/sub c/), where t/sub r/ and t/sub c/ are defined as the times for a unit routing step and a comparison processing, respectively.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117290364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithms and architectures for high performance recursive filtering 高性能递归滤波的算法和体系结构
S. E. McQuillan, J. McCanny
Recently, a number of most significant digit (msd) first bit parallel multipliers for recursive filtering have been reported. However, the design approach which has been used has, in general, been heuristic and consequently, optimality has not always been assured. In this paper, msd first multiply accumulate algorithms are described and important relationships governing the dependencies between latency, number representations, etc. are derived. A more systematic approach to designing recursive filters is illustrated by applying the algorithms and associated relationships to the design of cascadable modules for high sample rate IIR filtering and wave digital filtering.<>
最近,一些用于递归滤波的最高有效位数(msd)第一位并行乘法器已经被报道。然而,通常使用的设计方法是启发式的,因此,并不总是保证最优性。本文首先描述了msd乘法累加算法,并推导了控制延迟、数字表示等依赖关系的重要关系。通过将算法和相关关系应用于高采样率IIR滤波和波形数字滤波的级联模块的设计,说明了设计递归滤波器的一种更系统的方法
{"title":"Algorithms and architectures for high performance recursive filtering","authors":"S. E. McQuillan, J. McCanny","doi":"10.1109/ASAP.1992.218569","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218569","url":null,"abstract":"Recently, a number of most significant digit (msd) first bit parallel multipliers for recursive filtering have been reported. However, the design approach which has been used has, in general, been heuristic and consequently, optimality has not always been assured. In this paper, msd first multiply accumulate algorithms are described and important relationships governing the dependencies between latency, number representations, etc. are derived. A more systematic approach to designing recursive filters is illustrated by applying the algorithms and associated relationships to the design of cascadable modules for high sample rate IIR filtering and wave digital filtering.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129236778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
[1992] Proceedings of the International Conference on Application Specific Array Processors
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1