首页 > 最新文献

[1992] Proceedings of the International Conference on Application Specific Array Processors最新文献

英文 中文
Advanced technology for improved signal processor efficiency 提高信号处理器效率的先进技术
E. Swartzlander
Wafer scale integration technology offers the promise of implementing application specific processors with significantly higher data rates, lower power, and smaller size than conventional VLSI implementations. Wafer scale integration implementations replace most of the signal lines between chips with intra-wafer lines that exhibit one to two orders of magnitude less stray capacitance so they may be driven at higher rates while consuming much less power. Application specific processors implemented with regular arrays of processing elements are attractive because their regularity simplifies the design, fabrication, and circumvention of faulty elements. This paper shows that one dimensional systolic arrays are more attractive for this application than other regular architectures. This paper also shows that (1:N) and (M:N) pooled sparing at the macrocell level is feasible to overcome the defects implicit in the fabrication process. Finally an example design for a systolic FFT processor is described to illustrate the wafer scale implementation of a signal processor.<>
晶圆级集成技术提供了实现具有比传统VLSI实现更高数据速率、更低功耗和更小尺寸的特定应用处理器的承诺。晶圆级集成实现用晶圆内线取代芯片之间的大多数信号线,这些信号线的杂散电容减少了一到两个数量级,因此它们可以以更高的速率驱动,同时消耗更少的功率。使用处理元素的规则数组实现的特定于应用程序的处理器很有吸引力,因为它们的规则简化了错误元素的设计、制造和规避。本文表明,一维收缩阵列比其他常规结构更适合于这种应用。本文还证明了在宏细胞水平上(1:N)和(M:N)池节约是可行的,可以克服制造过程中隐含的缺陷。最后描述了一个收缩式FFT处理器的设计示例,以说明信号处理器>的晶圆级实现
{"title":"Advanced technology for improved signal processor efficiency","authors":"E. Swartzlander","doi":"10.1109/ASAP.1992.218567","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218567","url":null,"abstract":"Wafer scale integration technology offers the promise of implementing application specific processors with significantly higher data rates, lower power, and smaller size than conventional VLSI implementations. Wafer scale integration implementations replace most of the signal lines between chips with intra-wafer lines that exhibit one to two orders of magnitude less stray capacitance so they may be driven at higher rates while consuming much less power. Application specific processors implemented with regular arrays of processing elements are attractive because their regularity simplifies the design, fabrication, and circumvention of faulty elements. This paper shows that one dimensional systolic arrays are more attractive for this application than other regular architectures. This paper also shows that (1:N) and (M:N) pooled sparing at the macrocell level is feasible to overcome the defects implicit in the fabrication process. Finally an example design for a systolic FFT processor is described to illustrate the wafer scale implementation of a signal processor.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128030385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On partitioning of multistage algorithms and design of intermediate memories 多阶段算法的划分与中间存储器的设计
M. Sauer, E. Bernard, J. Nossek
Partitioning of a class of algorithms with global data dependencies, called multistage algorithms, is investigated. Partitioning requires intermediate results of computations of a specific block of the partition to be stored in an intermediate memory. Furthermore a decomposition of the global interconnection structure of the algorithm is necessary. The authors outline a design methodology for the intermediate memories which perform the data rearrangements according to the interconnection relation and that consist of locally connected synchronous modules. Additionally procedures for deriving control signals for the intermediate memory are presented, which can serve as a basis for control minimization.<>
研究了一类具有全局数据依赖的多阶段算法的划分问题。分区需要将分区的特定块的计算中间结果存储在中间内存中。此外,还需要对算法的全局互连结构进行分解。作者概述了一种中间存储器的设计方法,它根据互连关系进行数据重排,由局部连接的同步模块组成。此外,还提出了中间存储器控制信号的推导过程,这可以作为控制最小化的基础
{"title":"On partitioning of multistage algorithms and design of intermediate memories","authors":"M. Sauer, E. Bernard, J. Nossek","doi":"10.1109/ASAP.1992.218579","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218579","url":null,"abstract":"Partitioning of a class of algorithms with global data dependencies, called multistage algorithms, is investigated. Partitioning requires intermediate results of computations of a specific block of the partition to be stored in an intermediate memory. Furthermore a decomposition of the global interconnection structure of the algorithm is necessary. The authors outline a design methodology for the intermediate memories which perform the data rearrangements according to the interconnection relation and that consist of locally connected synchronous modules. Additionally procedures for deriving control signals for the intermediate memory are presented, which can serve as a basis for control minimization.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131721982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Associative information processing: algorithms and system 关联信息处理:算法与系统
Werner Pöchmüller, A. König, M. Glesner
Associative systems provide a flexibility ranging far beyond the scope of a conventional associative memory which simply provides a parallel search within a large amount of keywords to retrieve associated information. This paper presents several approaches to associative data processing. Algorithms are discussed that can easily be implemented or supported on an array computer. By means of dedicated VLSI chips a prototype array computer was implemented at Darmstadt University of Darmstadt. Together with simulations on conventional sequential computers, this array computer serves to prove the validity of developed algorithms on a running system.<>
联想系统提供的灵活性远远超出了传统联想记忆的范围,传统联想记忆只是在大量关键字中提供并行搜索以检索相关信息。本文介绍了几种关联数据处理的方法。讨论了易于在阵列计算机上实现或支持的算法。利用专用的VLSI芯片,在德国达姆施塔特大学实现了一台阵列计算机原型机。与传统顺序计算机上的仿真相结合,该阵列计算机证明了所开发算法在运行系统上的有效性
{"title":"Associative information processing: algorithms and system","authors":"Werner Pöchmüller, A. König, M. Glesner","doi":"10.1109/ASAP.1992.218546","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218546","url":null,"abstract":"Associative systems provide a flexibility ranging far beyond the scope of a conventional associative memory which simply provides a parallel search within a large amount of keywords to retrieve associated information. This paper presents several approaches to associative data processing. Algorithms are discussed that can easily be implemented or supported on an array computer. By means of dedicated VLSI chips a prototype array computer was implemented at Darmstadt University of Darmstadt. Together with simulations on conventional sequential computers, this array computer serves to prove the validity of developed algorithms on a running system.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125212210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An integrated system for rapid prototyping of high performance algorithm specific data paths 一个集成系统的快速原型的高性能算法特定的数据路径
D. Chen, L. Guerra, E. Ng, M. Potkonjak, D. P. Schultz, J. Rabaey
A system has been developed which targets the rapid prototyping of high performance data computation units which are typical to real-time digital signal processing applications. The hardware platform of the system is a family of multiprocessor integrated circuits. The prototype chip of this family contains 8 processors connected via a dynamically controlled crossbar switch. With a maximum clock rate of 25 MHz, it can support a computation rate of 200 MIPs and can sustain a data I/O bandwidth of 400 MByte/sec. An assembler and simulator provide low-level programmability of the hardware. A compiler which takes input described in the high-level data flow language Silage, and performs estimation, transformations, partitioning, assignment, and scheduling before generating assembly code, provides an automated software compilation path.<>
针对实时数字信号处理应用中典型的高性能数据计算单元的快速原型设计,开发了一个系统。系统的硬件平台是一个多处理器集成电路家族。该系列的原型芯片包含8个处理器,通过动态控制的横杆开关连接。时钟速率最高可达25mhz,可支持200mips的计算速率,可维持400mbyte /sec的数据I/O带宽。汇编器和模拟器提供硬件的低级可编程性。编译器接受用高级数据流语言Silage描述的输入,并在生成汇编代码之前执行估计、转换、分区、分配和调度,从而提供了一个自动化的软件编译路径
{"title":"An integrated system for rapid prototyping of high performance algorithm specific data paths","authors":"D. Chen, L. Guerra, E. Ng, M. Potkonjak, D. P. Schultz, J. Rabaey","doi":"10.1109/ASAP.1992.218576","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218576","url":null,"abstract":"A system has been developed which targets the rapid prototyping of high performance data computation units which are typical to real-time digital signal processing applications. The hardware platform of the system is a family of multiprocessor integrated circuits. The prototype chip of this family contains 8 processors connected via a dynamically controlled crossbar switch. With a maximum clock rate of 25 MHz, it can support a computation rate of 200 MIPs and can sustain a data I/O bandwidth of 400 MByte/sec. An assembler and simulator provide low-level programmability of the hardware. A compiler which takes input described in the high-level data flow language Silage, and performs estimation, transformations, partitioning, assignment, and scheduling before generating assembly code, provides an automated software compilation path.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125815005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
ARREST: an interactive graphic analysis tool for VLSI arrays 一个用于VLSI阵列的交互式图形分析工具
W. Burleson, Bongjin Jung
The authors present a graphical CAD tool, Array Estimator (ARREST), for VLSI array architectures. In real VLSI arrays, piece-wise regular computations are spread across space and time and occur at a fine-grain, which can make visualization quite difficult. Consequently, a graphical interface environment is desirable to enhance the design, verification, and analysis of VLSI arrays by providing feedback at all levels of the design process. ARREST reads a high level description of structured VLSI algorithms in terms of affine recurrence equations (AREs) and permits a broad range of transformations on the algorithm. The system does not target a fully automated design process, instead it provides a designer with a means to systematically explore various array architectures and evaluate design trade-offs between VLSI cost and performance. To allow a human designer better insight into the design process, ARREST uses the Xt/MOTIF window system for graphics and interfaces to the Cadence VERILOG simulator.<>
作者提出了一个图形化的CAD工具,阵列估计器(Array Estimator, ARREST),用于超大规模集成电路阵列架构。在实际的VLSI阵列中,分段规则计算分布在空间和时间上,并且发生在细粒度上,这可能会使可视化变得相当困难。因此,通过在设计过程的各个层面提供反馈,图形界面环境对于增强VLSI阵列的设计、验证和分析是可取的。根据仿射递归方程(AREs), ARREST读取结构化VLSI算法的高级描述,并允许在算法上进行广泛的转换。该系统并不针对完全自动化的设计过程,而是为设计人员提供了一种系统地探索各种阵列架构并评估超大规模集成电路成本和性能之间设计权衡的方法。为了让人类设计师更好地洞察设计过程,ARREST使用Xt/MOTIF窗口系统的图形和接口到Cadence VERILOG模拟器。
{"title":"ARREST: an interactive graphic analysis tool for VLSI arrays","authors":"W. Burleson, Bongjin Jung","doi":"10.1109/ASAP.1992.218575","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218575","url":null,"abstract":"The authors present a graphical CAD tool, Array Estimator (ARREST), for VLSI array architectures. In real VLSI arrays, piece-wise regular computations are spread across space and time and occur at a fine-grain, which can make visualization quite difficult. Consequently, a graphical interface environment is desirable to enhance the design, verification, and analysis of VLSI arrays by providing feedback at all levels of the design process. ARREST reads a high level description of structured VLSI algorithms in terms of affine recurrence equations (AREs) and permits a broad range of transformations on the algorithm. The system does not target a fully automated design process, instead it provides a designer with a means to systematically explore various array architectures and evaluate design trade-offs between VLSI cost and performance. To allow a human designer better insight into the design process, ARREST uses the Xt/MOTIF window system for graphics and interfaces to the Cadence VERILOG simulator.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115453575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Heterogeneous digital signal processing systems for sonar 声纳异构数字信号处理系统
T. E. Curtis
Current operational UK sonars use processors with throughputs in excess of five hundred million arithmetic operations per second. Several orders of magnitude increase in computing power are required to maintain long range surveillance capabilities in the 1990s and, within the next decade, typical applications will need throughputs approaching one million, million arithmetic operations per second, significantly greater than that currently achieved with fifth generation computers. This paper discusses some of the problems in realising systems with this level of performance.<>
目前正在使用的英国声纳使用的处理器的吞吐量超过每秒5亿次算术运算。在20世纪90年代,为了保持远程监视能力,需要在计算能力上增加几个数量级,在接下来的十年里,典型的应用将需要接近每秒一百万次算术运算的吞吐量,比目前第五代计算机所实现的要大得多。本文讨论了实现具有这种性能水平的系统的一些问题。
{"title":"Heterogeneous digital signal processing systems for sonar","authors":"T. E. Curtis","doi":"10.1109/ASAP.1992.218564","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218564","url":null,"abstract":"Current operational UK sonars use processors with throughputs in excess of five hundred million arithmetic operations per second. Several orders of magnitude increase in computing power are required to maintain long range surveillance capabilities in the 1990s and, within the next decade, typical applications will need throughputs approaching one million, million arithmetic operations per second, significantly greater than that currently achieved with fifth generation computers. This paper discusses some of the problems in realising systems with this level of performance.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128315137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On cycle borrowing analyses for interconnected chips driven by clocks having different but commensurable speeds 由不同但可通约的时钟驱动的互连芯片的周期借用分析
G. Jennings
The author considers the construction of synchronous systems having components driven at different rates by different, but commensurable, clocks. Furthermore these systems are to be constructed using level-sensitive latches with the intent of exploiting cycle borrowing over the entire system. The author presents a framework in which the entire system is managed as a single clocked entity, and investigates a timing analysis technique for such systems. Results for small examples are presented. The interface between such chips is studied; no resynchronizers are required. Alternate clock waveforms, and their effect on analysis complexity, are discussed.<>
作者考虑了具有由不同但可通约的时钟以不同速率驱动的组件的同步系统的构造。此外,这些系统将使用电平敏感锁存器构建,目的是利用整个系统的周期借用。作者提出了一个框架,在这个框架中,整个系统作为一个单一的时钟实体来管理,并研究了这种系统的时序分析技术。给出了小实例的结果。研究了这些芯片之间的接口;不需要重新同步器。讨论了交替时钟波形及其对分析复杂度的影响。
{"title":"On cycle borrowing analyses for interconnected chips driven by clocks having different but commensurable speeds","authors":"G. Jennings","doi":"10.1109/ASAP.1992.218580","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218580","url":null,"abstract":"The author considers the construction of synchronous systems having components driven at different rates by different, but commensurable, clocks. Furthermore these systems are to be constructed using level-sensitive latches with the intent of exploiting cycle borrowing over the entire system. The author presents a framework in which the entire system is managed as a single clocked entity, and investigates a timing analysis technique for such systems. Results for small examples are presented. The interface between such chips is studied; no resynchronizers are required. Alternate clock waveforms, and their effect on analysis complexity, are discussed.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"86 22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131030784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A transformative approach to the partitioning of processor arrays 对处理器阵列进行分区的一种变革性方法
J. Teich, L. Thiele
The paper describes the systematic design of processor arrays with a given dimension and a given number of processing elements. The unified approach to the solution of this problem called partitioning is based on the following concepts: (1) Algorithms and processor arrays are represented by (piecewise regular) programs. (2) The concept of stepwise refinement of programs is used to solve the partitioning problem by applying a sequence of provably correct program transformations. In contrary to other approaches, nonperfect tilings may be considered. The parameters of the introduced program transformations enable the realization of different partitioning schemes. (3) It is shown that the class of piecewise regular programs is closed under partitioning.<>
本文描述了具有一定尺寸和一定数量处理单元的处理器阵列的系统设计。解决这个问题的统一方法称为分区,它基于以下概念:(1)算法和处理器数组由(分段规则)程序表示。(2)采用程序逐步细化的概念,通过应用一系列可证明正确的程序变换来解决划分问题。与其他方法相反,可以考虑不完美的平铺。所引入的程序转换的参数使得实现不同的分区方案成为可能。(3)证明了分段正则规划类在划分下是封闭的。
{"title":"A transformative approach to the partitioning of processor arrays","authors":"J. Teich, L. Thiele","doi":"10.1109/ASAP.1992.218585","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218585","url":null,"abstract":"The paper describes the systematic design of processor arrays with a given dimension and a given number of processing elements. The unified approach to the solution of this problem called partitioning is based on the following concepts: (1) Algorithms and processor arrays are represented by (piecewise regular) programs. (2) The concept of stepwise refinement of programs is used to solve the partitioning problem by applying a sequence of provably correct program transformations. In contrary to other approaches, nonperfect tilings may be considered. The parameters of the introduced program transformations enable the realization of different partitioning schemes. (3) It is shown that the class of piecewise regular programs is closed under partitioning.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121753471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Parallel architecture for a pel-recursive motion estimation algorithm 一种基于并行结构的球递归运动估计算法
Emmanuel D. Frimout, J. Driessen, E. Deprettere
The paper presents a parallel architecture for a pel-recursive motion estimation algorithm. It is a linear array of processors, each consisting of an initialization part, a data-routing part and an updating part. The initializing part performs a prediction of the motion vector. The routing parts constitute a routing path along which previous-frame data is routed from processors that store to processors that request such data. A clocked version of the router is presented with some detail. The updating part calculates an update to the predicted motion vector. The architecture proposed is derived in a systematic way and is parameterized w.r.t. certain window sizes. It is thus completely different from the few existing pel-recursive motion estimation architectures.<>
提出了一种基于并行结构的球递归运动估计算法。它是处理器的线性阵列,每个处理器由初始化部分、数据路由部分和更新部分组成。初始化部分执行运动矢量的预测。路由部分构成路由路径,沿该路径将前一帧数据从存储该数据的处理器路由到请求该数据的处理器。时钟版本的路由器给出了一些细节。更新部分计算对预测的运动向量的更新。所提出的体系结构是以系统的方式推导出来的,并根据特定的窗口大小进行参数化。因此,它与现有的几种单递归运动估计体系结构完全不同。
{"title":"Parallel architecture for a pel-recursive motion estimation algorithm","authors":"Emmanuel D. Frimout, J. Driessen, E. Deprettere","doi":"10.1109/ASAP.1992.218545","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218545","url":null,"abstract":"The paper presents a parallel architecture for a pel-recursive motion estimation algorithm. It is a linear array of processors, each consisting of an initialization part, a data-routing part and an updating part. The initializing part performs a prediction of the motion vector. The routing parts constitute a routing path along which previous-frame data is routed from processors that store to processors that request such data. A clocked version of the router is presented with some detail. The updating part calculates an update to the predicted motion vector. The architecture proposed is derived in a systematic way and is parameterized w.r.t. certain window sizes. It is thus completely different from the few existing pel-recursive motion estimation architectures.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122585943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Pipelining: just another transformation 流水线:只是另一种转换
M. Potkonjak, J. Rabaey
A simple formulation of pipelining: 'Pipelining with N stages is equivalent to retiming where the number of delays on all inputs or all outputs, but not both, is increased by N' is used as the basis for a convenient and efficient treatment of pipelining in design of application specific computers. Classification of pipelining according to the optimization goal (throughput and resource utilization) and the latency is introduced. For polynomial complexity pipelining classes, optimal algorithms are presented. For other classes both proof of NP-completeness and efficient probabilistic algorithms are presented. Both theoretical and experimental properties of pipelining are discussed. In particular, a relationship with other transformations is explored. Due to close relationship between software pipelining and pipelining presented, all results can be easily modified for use in compilers for general purpose computers. Also, as a side result, the exact bound (solution) for iteration bound is derived.<>
流水线的一个简单公式:“N级的流水线相当于重新定时,其中所有输入或所有输出的延迟数量增加N,但不是两者都增加N”被用作设计特定应用计算机时方便有效地处理流水线的基础。根据优化目标(吞吐量和资源利用率)和延迟对流水线进行了分类。对于多项式复杂度的流水线类,给出了最优算法。对于其他类,给出了np完备性的证明和有效的概率算法。讨论了流水线的理论和实验特性。特别是,与其他转换的关系进行了探讨。由于软件流水线和流水线之间的密切关系,所有的结果都可以很容易地修改,以便在通用计算机的编译器中使用。同时,作为附带结果,迭代边界的精确边界(解)被导出。
{"title":"Pipelining: just another transformation","authors":"M. Potkonjak, J. Rabaey","doi":"10.1109/ASAP.1992.218574","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218574","url":null,"abstract":"A simple formulation of pipelining: 'Pipelining with N stages is equivalent to retiming where the number of delays on all inputs or all outputs, but not both, is increased by N' is used as the basis for a convenient and efficient treatment of pipelining in design of application specific computers. Classification of pipelining according to the optimization goal (throughput and resource utilization) and the latency is introduced. For polynomial complexity pipelining classes, optimal algorithms are presented. For other classes both proof of NP-completeness and efficient probabilistic algorithms are presented. Both theoretical and experimental properties of pipelining are discussed. In particular, a relationship with other transformations is explored. Due to close relationship between software pipelining and pipelining presented, all results can be easily modified for use in compilers for general purpose computers. Also, as a side result, the exact bound (solution) for iteration bound is derived.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121750684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
期刊
[1992] Proceedings of the International Conference on Application Specific Array Processors
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1