首页 > 最新文献

Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors最新文献

英文 中文
A linear array parallel image processor: SliM-II 线性阵列并行图像处理器SliM-II
Hyunman Chang, S. Ong, M. Sunwoo
This paper describes architectures and design of a general purpose parallel image processor chip called a SliM-II Image Processor. The chip has a linear array of 64 processing elements (PEs), operates at 30 MHz in the worst case simulation and gives 1.92 GIPS. SIiM-II can greatly reduce the inter-PE communication overhead, due to the idea of sliding that is overlapping inter-PE communication with computation. In contrast to existing array processors, each PE has a multiplier that is quite effective for convolution, template matching, etc. The instruction set can execute an ALU operation, data I/O, and inter-PE communication simultaneously in an instruction cycle. In addition, during the ALU/multiplier operation, SliM-II provides parallel load/store between the register file and on-chip memory as in DSP chips. The bandwidth of data I/O and inter-PE communication increases due to bit-parallel paths. We developed VHDL models and performed logic synthesis using the COMPASS/sup TM/ CAD tool. We used the COMPASS/sup TM/ 3.3 V 0.6 /spl mu/m standard cell library (v8r4.9.1). The total number of transistors is about 1.5 millions. The SliM-II chip is being fabricated at the LG Semiconductor Co,, Ltd. The performance estimation shows a significant improvement for algorithms requiring multiplications compared with existing array processors.
本文介绍了一种通用并行图像处理器芯片SliM-II的结构和设计。该芯片具有64个处理元件(pe)的线性阵列,在最坏情况模拟下工作在30 MHz,并给出1.92 GIPS。SIiM-II可以大大减少pe间通信开销,因为滑动的思想是将pe间通信与计算重叠。与现有的数组处理器相比,每个PE都有一个乘法器,对卷积、模板匹配等非常有效。该指令集可以在一个指令周期内同时执行ALU操作、数据I/O和pe间通信。此外,在ALU/乘法器操作期间,SliM-II在寄存器文件和片上存储器之间提供并行加载/存储,就像在DSP芯片中一样。由于采用位并行路径,数据I/O和pe间通信的带宽增加。利用COMPASS/sup TM/ CAD工具建立VHDL模型并进行逻辑综合。我们使用COMPASS/sup TM/ 3.3 V 0.6 /spl mu/m标准细胞库(v8r4.9.1)。晶体管的总数约为150万个。SliM-II芯片目前正在LG半导体公司生产。性能估计表明,与现有的阵列处理器相比,需要乘法的算法有了显著的改进。
{"title":"A linear array parallel image processor: SliM-II","authors":"Hyunman Chang, S. Ong, M. Sunwoo","doi":"10.1109/ASAP.1997.606810","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606810","url":null,"abstract":"This paper describes architectures and design of a general purpose parallel image processor chip called a SliM-II Image Processor. The chip has a linear array of 64 processing elements (PEs), operates at 30 MHz in the worst case simulation and gives 1.92 GIPS. SIiM-II can greatly reduce the inter-PE communication overhead, due to the idea of sliding that is overlapping inter-PE communication with computation. In contrast to existing array processors, each PE has a multiplier that is quite effective for convolution, template matching, etc. The instruction set can execute an ALU operation, data I/O, and inter-PE communication simultaneously in an instruction cycle. In addition, during the ALU/multiplier operation, SliM-II provides parallel load/store between the register file and on-chip memory as in DSP chips. The bandwidth of data I/O and inter-PE communication increases due to bit-parallel paths. We developed VHDL models and performed logic synthesis using the COMPASS/sup TM/ CAD tool. We used the COMPASS/sup TM/ 3.3 V 0.6 /spl mu/m standard cell library (v8r4.9.1). The total number of transistors is about 1.5 millions. The SliM-II chip is being fabricated at the LG Semiconductor Co,, Ltd. The performance estimation shows a significant improvement for algorithms requiring multiplications compared with existing array processors.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128123315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Realization of a nonlinear digital filter on a DSP array processor 非线性数字滤波器在DSP阵列处理器上的实现
H. Kwan, E. Powers, E. Swartzlander
This paper presents the performance evaluation of a fast third-order Volterra digital filtering algorithm mapped onto an AT&T DSP-3 parallel processor. Five different implementations are considered. Speed-up results indicate that the "time-skewing" method is currently the fastest. An application to nonlinear communication channel equalization using a 64-QAM signal constellation is presented.
本文给出了一种映射到AT&T DSP-3并行处理器上的快速三阶Volterra数字滤波算法的性能评价。考虑了五种不同的实现。加速结果表明,“时间倾斜”方法目前是最快的。提出了一种利用64-QAM信号星座进行非线性通信信道均衡的应用。
{"title":"Realization of a nonlinear digital filter on a DSP array processor","authors":"H. Kwan, E. Powers, E. Swartzlander","doi":"10.1109/ASAP.1997.606809","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606809","url":null,"abstract":"This paper presents the performance evaluation of a fast third-order Volterra digital filtering algorithm mapped onto an AT&T DSP-3 parallel processor. Five different implementations are considered. Speed-up results indicate that the \"time-skewing\" method is currently the fastest. An application to nonlinear communication channel equalization using a 64-QAM signal constellation is presented.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131200925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Low latency word serial CORDIC 低延迟字串行CORDIC
J. Villalba, T. Lang
In this paper we present a modification of the CORDIC algorithm which reduces the number of iterations almost to half by merging two successive iterations of the basic algorithm. The two coefficients per iteration are obtained with only a small increase in the cycle time by estimating one of the coefficients. A correcting iteration method is used to correct the possible errors produced by the estimate. Moreover, the modified iteration permits the reduction of the number of cycles required for the compensation of the scaling factor. The resulting architecture is word serial, working both in rotation and vectoring operation modes, presenting a low latency in comparison with the classical CORDIC approach.
本文提出了一种改进的CORDIC算法,通过合并基本算法的两个连续迭代,将迭代次数减少了近一半。通过估计其中一个系数,每次迭代得到两个系数,而周期时间只增加很小。采用修正迭代法对估计过程中可能产生的误差进行修正。此外,改进的迭代允许减少补偿比例因子所需的循环次数。由此产生的体系结构是字串行的,可以在旋转和矢量操作模式下工作,与经典的CORDIC方法相比,具有较低的延迟。
{"title":"Low latency word serial CORDIC","authors":"J. Villalba, T. Lang","doi":"10.1109/ASAP.1997.606819","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606819","url":null,"abstract":"In this paper we present a modification of the CORDIC algorithm which reduces the number of iterations almost to half by merging two successive iterations of the basic algorithm. The two coefficients per iteration are obtained with only a small increase in the cycle time by estimating one of the coefficients. A correcting iteration method is used to correct the possible errors produced by the estimate. Moreover, the modified iteration permits the reduction of the number of cycles required for the compensation of the scaling factor. The resulting architecture is word serial, working both in rotation and vectoring operation modes, presenting a low latency in comparison with the classical CORDIC approach.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131396877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Mapping multirate dataflow to complex RT level hardware models 将多速率数据流映射到复杂的RT级硬件模型
J. Horstmannshoff, Thorsten Grötker, H. Meyr
The design of digital signal processing systems typically consists of an algorithm development phase carried out at a behavioral level and the selection of an efficient hardware architecture for implementation. In order to speed up the joint optimization of algorithms and architectures, a fast path to implementation must be provided. This can be achieved efficiently by directly mapping the data flow specification of the system to an RTL target architecture by means of HDL code generation. For algorithm design, communication systems are most easily modeled using multirate data flow graphs in which no notion of time is maintained. HDL code generation introduces a cycle based timing model and maps the data flow models to RTL implementations, which are usually taken from a library. Due to the increase in ASIC design complexity, these building blocks reach a high level of functionality and have complex interfacing properties. Therefore, it becomes necessary to generate additional interfacing and controlling hardware to synthesize an operable system. In this paper, we present a new approach of mapping multirate dataflow graphs to complex RTL hardware models and derive algorithms to synthesize these high-level RTL building blocks into a complete operable system.
数字信号处理系统的设计通常包括在行为层面上执行的算法开发阶段和选择有效的硬件架构进行实现。为了加快算法和架构的联合优化,必须提供快速的实现路径。通过HDL代码生成将系统的数据流规范直接映射到RTL目标体系结构,可以有效地实现这一目标。对于算法设计,通信系统最容易使用多速率数据流图建模,其中没有时间概念。HDL代码生成引入了一个基于周期的时序模型,并将数据流模型映射到RTL实现,RTL实现通常取自一个库。由于ASIC设计复杂性的增加,这些构建块达到了高水平的功能,并具有复杂的接口属性。因此,有必要生成额外的接口和控制硬件来合成一个可操作的系统。在本文中,我们提出了一种将多速率数据流图映射到复杂RTL硬件模型的新方法,并推导了将这些高级RTL构建块合成为完整可操作系统的算法。
{"title":"Mapping multirate dataflow to complex RT level hardware models","authors":"J. Horstmannshoff, Thorsten Grötker, H. Meyr","doi":"10.1109/ASAP.1997.606834","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606834","url":null,"abstract":"The design of digital signal processing systems typically consists of an algorithm development phase carried out at a behavioral level and the selection of an efficient hardware architecture for implementation. In order to speed up the joint optimization of algorithms and architectures, a fast path to implementation must be provided. This can be achieved efficiently by directly mapping the data flow specification of the system to an RTL target architecture by means of HDL code generation. For algorithm design, communication systems are most easily modeled using multirate data flow graphs in which no notion of time is maintained. HDL code generation introduces a cycle based timing model and maps the data flow models to RTL implementations, which are usually taken from a library. Due to the increase in ASIC design complexity, these building blocks reach a high level of functionality and have complex interfacing properties. Therefore, it becomes necessary to generate additional interfacing and controlling hardware to synthesize an operable system. In this paper, we present a new approach of mapping multirate dataflow graphs to complex RTL hardware models and derive algorithms to synthesize these high-level RTL building blocks into a complete operable system.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115952463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
A logical framework to prove properties of ALPHA programs 证明ALPHA程序性质的逻辑框架
L. Bougé, D. Cachera
We present an assertional approach to prove properties of ALPHA programs. ALPHA is a functional language based on affine recurrence equations. We first present two kinds of operational semantics for ALPHA together with some equivalence and confluence properties of these semantics. We then present an attempt to provide ALPHA with an external logical framework. We therefore define a proof method based on invariants. We focus on a particular class of invariants, namely canonical invariants, that are a logical expression of the program's semantics. We finally show that this framework is well-suited to prove partial properties, equivalence properties between ALPHA programs and properties that we cannot express within the ALPHA language.
我们提出了一种证明ALPHA程序性质的断言方法。ALPHA是一种基于仿射递推方程的函数式语言。首先给出了ALPHA的两种操作语义,并给出了这些语义的等价性和合流性。然后,我们尝试为ALPHA提供一个外部逻辑框架。因此,我们定义了一种基于不变量的证明方法。我们关注一类特殊的不变量,即规范不变量,它们是程序语义的逻辑表达。我们最后证明了这个框架非常适合于证明ALPHA程序之间的部分性质、等价性质以及我们无法在ALPHA语言中表达的性质。
{"title":"A logical framework to prove properties of ALPHA programs","authors":"L. Bougé, D. Cachera","doi":"10.1109/ASAP.1997.606825","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606825","url":null,"abstract":"We present an assertional approach to prove properties of ALPHA programs. ALPHA is a functional language based on affine recurrence equations. We first present two kinds of operational semantics for ALPHA together with some equivalence and confluence properties of these semantics. We then present an attempt to provide ALPHA with an external logical framework. We therefore define a proof method based on invariants. We focus on a particular class of invariants, namely canonical invariants, that are a logical expression of the program's semantics. We finally show that this framework is well-suited to prove partial properties, equivalence properties between ALPHA programs and properties that we cannot express within the ALPHA language.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123482734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Conception and design of a RISC CPU for the use as embedded controller within a parallel multimedia architecture RISC CPU在并行多媒体架构中作为嵌入式控制器的概念与设计
S. Dogimont, M. Gumm, F. Mombers, D. Mlynek, A. Torielli
In this paper, the problem of defining a high performance control structure for a parallel motion estimation architecture for MPEG2 coding is addressed. Various design and architecture choices are discussed and the final architecture is described. It represents a combined MIMD-SIMD approach which is based on a small but efficient ASIP with subword parallelism.
本文讨论了MPEG2编码的并行运动估计结构的高性能控制结构的定义问题。讨论了各种设计和体系结构选择,并描述了最终的体系结构。它代表了一种结合MIMD-SIMD的方法,该方法基于具有子字并行性的小而高效的ASIP。
{"title":"Conception and design of a RISC CPU for the use as embedded controller within a parallel multimedia architecture","authors":"S. Dogimont, M. Gumm, F. Mombers, D. Mlynek, A. Torielli","doi":"10.1109/ASAP.1997.606846","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606846","url":null,"abstract":"In this paper, the problem of defining a high performance control structure for a parallel motion estimation architecture for MPEG2 coding is addressed. Various design and architecture choices are discussed and the final architecture is described. It represents a combined MIMD-SIMD approach which is based on a small but efficient ASIP with subword parallelism.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117080294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
The processing graph method tool (PGMT) 加工图方法工具(PGMT)
R. S. Stevens
To acquire stare-of-the-art hardware at reduced cost, the U.S. Navy is committed to buying commercial off the shelf (COTS) computer hardware. In this rapidly changing technological world, today's hardware will be obsolete tomorrow. The Navy's complex problems often require more computational power than can be delivered by a single serial processor. The solution lies in distributed processing. However, distributed processors tend to have architecture specific languages, requiring an expensive and time-consuming manual rewrite of application software as new technology and new machines become available. The processing graph method (PGM), developed at the Naval Research Laboratory (NRL) in Washington, DC, is an architecture independent method for specifying application software for distributed architectures. Its model of computation is reconfigurable dynamic data flow: dynamic because the amount of data consumed and produced by an actor may vary from one firing to another; and reconfigurable, because a graph may be disassembled and reassembled. PGM was implemented on the Navy Standard Signal Processor (AN/UYS-2), and on VAX and Sun workstations. The PGMT project at NRL is developing a tool set that will facilitate the implementation of PGM on a given distributed architecture at relatively low cost. We describe the major features PGM and discuss the PGMT project.
为了以较低的成本获得最先进的硬件,美国海军致力于购买商用现货(COTS)计算机硬件。在这个快速变化的技术世界里,今天的硬件明天就会过时。海军的复杂问题通常需要比单个串行处理器更强大的计算能力。解决方案在于分布式处理。然而,分布式处理器倾向于使用特定于体系结构的语言,因此,随着新技术和新机器的出现,需要对应用程序软件进行昂贵且耗时的手动重写。处理图方法(PGM)由位于华盛顿特区的海军研究实验室(NRL)开发,是一种独立于体系结构的方法,用于为分布式体系结构指定应用软件。它的计算模型是可重构的动态数据流:动态是因为参与者所消耗和产生的数据量可能因发射而异;而且是可重构的,因为图可以被拆解和重组。PGM在海军标准信号处理器(AN/UYS-2)和VAX和Sun工作站上实现。NRL的PGMT项目正在开发一个工具集,该工具集将以相对较低的成本促进在给定的分布式体系结构上实现PGM。我们描述了PGM的主要特性,并讨论了PGMT项目。
{"title":"The processing graph method tool (PGMT)","authors":"R. S. Stevens","doi":"10.1109/ASAP.1997.606832","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606832","url":null,"abstract":"To acquire stare-of-the-art hardware at reduced cost, the U.S. Navy is committed to buying commercial off the shelf (COTS) computer hardware. In this rapidly changing technological world, today's hardware will be obsolete tomorrow. The Navy's complex problems often require more computational power than can be delivered by a single serial processor. The solution lies in distributed processing. However, distributed processors tend to have architecture specific languages, requiring an expensive and time-consuming manual rewrite of application software as new technology and new machines become available. The processing graph method (PGM), developed at the Naval Research Laboratory (NRL) in Washington, DC, is an architecture independent method for specifying application software for distributed architectures. Its model of computation is reconfigurable dynamic data flow: dynamic because the amount of data consumed and produced by an actor may vary from one firing to another; and reconfigurable, because a graph may be disassembled and reassembled. PGM was implemented on the Navy Standard Signal Processor (AN/UYS-2), and on VAX and Sun workstations. The PGMT project at NRL is developing a tool set that will facilitate the implementation of PGM on a given distributed architecture at relatively low cost. We describe the major features PGM and discuss the PGMT project.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116956126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A modular element for shared buffer ATM switch fabrics 用于共享缓冲ATM交换结构的模块化元件
Mike Parks
This paper presents the architecture of a modular element for the design of shared buffer ATM switch fabrics. The component is designed for deployment in a bit-sliced approach, and includes mechanisms to allow the number of elements in the fabric to be matched to the required aggregate bandwidth of the switch. All of the input ports must be synchronized to a Start of Cell input signal; the output ports optionally can be synchronized via an Output Hold signal. A bus forwards a portion of each incoming cell to a separate controller for identification and prioritization of the corresponding output operations. In addition to supporting width expansion for increased bandwidth, the component is designed to support depth expansion for more cell storage capacity at a given aggregate throughput. The component includes 32 one-bit inputs, 32 one-bit outputs, and 4 megabits of static RAM storage. Eight of the 100 MHz devices comprise a 32 port ATM switch fabric with an aggregate bandwidth of 20 gigabits per second and a storage capacity of 64 K/spl times/512 bits.
本文提出了一种用于共享缓冲ATM交换结构设计的模块化结构。该组件设计用于以位切片方式部署,并包括允许fabric中的元素数量与交换机所需的聚合带宽相匹配的机制。所有的输入端口必须同步到细胞开始输入信号;输出端口可选择通过输出保持信号同步。总线将每个传入单元的一部分转发到单独的控制器,以便识别和确定相应输出操作的优先级。除了支持宽度扩展以增加带宽外,该组件还支持深度扩展,以在给定的总吞吐量下获得更多的单元存储容量。该组件包括32个1位输入,32个1位输出和4mb的静态RAM存储。其中8个100 MHz设备包括一个32端口ATM交换结构,总带宽为每秒20千兆比特,存储容量为64 K/spl倍/512位。
{"title":"A modular element for shared buffer ATM switch fabrics","authors":"Mike Parks","doi":"10.1109/ASAP.1997.606848","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606848","url":null,"abstract":"This paper presents the architecture of a modular element for the design of shared buffer ATM switch fabrics. The component is designed for deployment in a bit-sliced approach, and includes mechanisms to allow the number of elements in the fabric to be matched to the required aggregate bandwidth of the switch. All of the input ports must be synchronized to a Start of Cell input signal; the output ports optionally can be synchronized via an Output Hold signal. A bus forwards a portion of each incoming cell to a separate controller for identification and prioritization of the corresponding output operations. In addition to supporting width expansion for increased bandwidth, the component is designed to support depth expansion for more cell storage capacity at a given aggregate throughput. The component includes 32 one-bit inputs, 32 one-bit outputs, and 4 megabits of static RAM storage. Eight of the 100 MHz devices comprise a 32 port ATM switch fabric with an aggregate bandwidth of 20 gigabits per second and a storage capacity of 64 K/spl times/512 bits.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121837557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A methodology for user-oriented scalability analysis 面向用户的可伸缩性分析方法
D. Royo, M. Valero-García, Antonio González, Carme Mari
Scalability analysis provides information about the effectiveness of increasing the number of resources of a parallel system. Several methods have been proposed which use different approaches to provide this information. This paper presents a family of analysis methods oriented to the user. The methods in this family should assist the user in estimating the benefits when increasing the system size. The key issue in the proposal is the appropriate combination of a scaling model, which reflects the way the users utilize an increasing number of resources, and a figure of merit that the user wants to improve with the larger system. Another important element in the proposal is the approach to characterize the scalability, which enables quick visual analyses and comparisons. Finally, three concrete examples of methods belonging to the proposed family are introduced in this paper.
可伸缩性分析提供了有关增加并行系统资源数量的有效性的信息。已经提出了几种方法,它们使用不同的方法来提供此信息。本文提出了一系列面向用户的分析方法。这个系列中的方法应该可以帮助用户在增加系统规模时估计收益。提案中的关键问题是适当地结合缩放模型,该模型反映了用户利用越来越多的资源的方式,以及用户希望通过更大的系统改进的价值数字。提案中的另一个重要元素是描述可伸缩性的方法,它可以实现快速的可视化分析和比较。最后,本文介绍了三种属于所提出的方法的具体实例。
{"title":"A methodology for user-oriented scalability analysis","authors":"D. Royo, M. Valero-García, Antonio González, Carme Mari","doi":"10.1109/ASAP.1997.606836","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606836","url":null,"abstract":"Scalability analysis provides information about the effectiveness of increasing the number of resources of a parallel system. Several methods have been proposed which use different approaches to provide this information. This paper presents a family of analysis methods oriented to the user. The methods in this family should assist the user in estimating the benefits when increasing the system size. The key issue in the proposal is the appropriate combination of a scaling model, which reflects the way the users utilize an increasing number of resources, and a figure of merit that the user wants to improve with the larger system. Another important element in the proposal is the approach to characterize the scalability, which enables quick visual analyses and comparisons. Finally, three concrete examples of methods belonging to the proposed family are introduced in this paper.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129326340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multiprocessor system for real time high resolution image correlation 一个实时高分辨率图像相关的多处理器系统
M. Cavadini, M. Wosnitza, M. Thaler, G. Tröster
In this paper a dedicated multiprocessor architecture for a real time implementation of the normalized cross correlation function (NCCF) on images up to 1024x1024 pixels is presented. The computational requirements are dramatically reduced by calculating this algorithm in the frequency domain. In contrast to a standard implementation of the NCCF which inherently imposes rectangular templates, the proposed enhanced method also allows to search for free-form templates which even may include holes. The computation in the frequency domain is based on a single program multiple data (SPMD) architecture which includes a dedicated ASIC for the computation of the 1D complex FFT. Besides this specific part of the system, the image pre- and post- processing tasks are supported by general purpose DSP's. A system consisting of 4 ASIC's and 2 Sharc DSP's is able to compute the enhanced NCCF of a free form template on images of 1024x1024 pixels within 134 ms (8 frames/s).
本文提出了一种用于在1024 × 1024像素的图像上实时实现归一化互相关函数(NCCF)的专用多处理器架构。通过在频域中进行计算,大大减少了计算量。与NCCF的标准实现相反,它固有地强加了矩形模板,所提出的增强方法还允许搜索甚至可能包含孔的自由形式模板。频域计算基于单程序多数据(SPMD)架构,其中包括用于计算一维复FFT的专用ASIC。该系统除该特定部分外,还由通用DSP支持图像的预处理和后处理任务。一个由4个ASIC和2个Sharc DSP组成的系统能够在134 ms(8帧/秒)内计算1024 × 1024像素图像上自由格式模板的增强NCCF。
{"title":"A multiprocessor system for real time high resolution image correlation","authors":"M. Cavadini, M. Wosnitza, M. Thaler, G. Tröster","doi":"10.1109/ASAP.1997.606843","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606843","url":null,"abstract":"In this paper a dedicated multiprocessor architecture for a real time implementation of the normalized cross correlation function (NCCF) on images up to 1024x1024 pixels is presented. The computational requirements are dramatically reduced by calculating this algorithm in the frequency domain. In contrast to a standard implementation of the NCCF which inherently imposes rectangular templates, the proposed enhanced method also allows to search for free-form templates which even may include holes. The computation in the frequency domain is based on a single program multiple data (SPMD) architecture which includes a dedicated ASIC for the computation of the 1D complex FFT. Besides this specific part of the system, the image pre- and post- processing tasks are supported by general purpose DSP's. A system consisting of 4 ASIC's and 2 Sharc DSP's is able to compute the enhanced NCCF of a free form template on images of 1024x1024 pixels within 134 ms (8 frames/s).","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125196580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1