Pub Date : 1992-08-04DOI: 10.1109/ASAP.1992.218569
S. E. McQuillan, J. McCanny
Recently, a number of most significant digit (msd) first bit parallel multipliers for recursive filtering have been reported. However, the design approach which has been used has, in general, been heuristic and consequently, optimality has not always been assured. In this paper, msd first multiply accumulate algorithms are described and important relationships governing the dependencies between latency, number representations, etc. are derived. A more systematic approach to designing recursive filters is illustrated by applying the algorithms and associated relationships to the design of cascadable modules for high sample rate IIR filtering and wave digital filtering.<>
{"title":"Algorithms and architectures for high performance recursive filtering","authors":"S. E. McQuillan, J. McCanny","doi":"10.1109/ASAP.1992.218569","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218569","url":null,"abstract":"Recently, a number of most significant digit (msd) first bit parallel multipliers for recursive filtering have been reported. However, the design approach which has been used has, in general, been heuristic and consequently, optimality has not always been assured. In this paper, msd first multiply accumulate algorithms are described and important relationships governing the dependencies between latency, number representations, etc. are derived. A more systematic approach to designing recursive filters is illustrated by applying the algorithms and associated relationships to the design of cascadable modules for high sample rate IIR filtering and wave digital filtering.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129236778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-08-04DOI: 10.1109/ASAP.1992.218579
M. Sauer, E. Bernard, J. Nossek
Partitioning of a class of algorithms with global data dependencies, called multistage algorithms, is investigated. Partitioning requires intermediate results of computations of a specific block of the partition to be stored in an intermediate memory. Furthermore a decomposition of the global interconnection structure of the algorithm is necessary. The authors outline a design methodology for the intermediate memories which perform the data rearrangements according to the interconnection relation and that consist of locally connected synchronous modules. Additionally procedures for deriving control signals for the intermediate memory are presented, which can serve as a basis for control minimization.<>
{"title":"On partitioning of multistage algorithms and design of intermediate memories","authors":"M. Sauer, E. Bernard, J. Nossek","doi":"10.1109/ASAP.1992.218579","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218579","url":null,"abstract":"Partitioning of a class of algorithms with global data dependencies, called multistage algorithms, is investigated. Partitioning requires intermediate results of computations of a specific block of the partition to be stored in an intermediate memory. Furthermore a decomposition of the global interconnection structure of the algorithm is necessary. The authors outline a design methodology for the intermediate memories which perform the data rearrangements according to the interconnection relation and that consist of locally connected synchronous modules. Additionally procedures for deriving control signals for the intermediate memory are presented, which can serve as a basis for control minimization.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131721982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-08-04DOI: 10.1109/ASAP.1992.218564
T. E. Curtis
Current operational UK sonars use processors with throughputs in excess of five hundred million arithmetic operations per second. Several orders of magnitude increase in computing power are required to maintain long range surveillance capabilities in the 1990s and, within the next decade, typical applications will need throughputs approaching one million, million arithmetic operations per second, significantly greater than that currently achieved with fifth generation computers. This paper discusses some of the problems in realising systems with this level of performance.<>
{"title":"Heterogeneous digital signal processing systems for sonar","authors":"T. E. Curtis","doi":"10.1109/ASAP.1992.218564","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218564","url":null,"abstract":"Current operational UK sonars use processors with throughputs in excess of five hundred million arithmetic operations per second. Several orders of magnitude increase in computing power are required to maintain long range surveillance capabilities in the 1990s and, within the next decade, typical applications will need throughputs approaching one million, million arithmetic operations per second, significantly greater than that currently achieved with fifth generation computers. This paper discusses some of the problems in realising systems with this level of performance.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128315137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-08-04DOI: 10.1109/ASAP.1992.218575
W. Burleson, Bongjin Jung
The authors present a graphical CAD tool, Array Estimator (ARREST), for VLSI array architectures. In real VLSI arrays, piece-wise regular computations are spread across space and time and occur at a fine-grain, which can make visualization quite difficult. Consequently, a graphical interface environment is desirable to enhance the design, verification, and analysis of VLSI arrays by providing feedback at all levels of the design process. ARREST reads a high level description of structured VLSI algorithms in terms of affine recurrence equations (AREs) and permits a broad range of transformations on the algorithm. The system does not target a fully automated design process, instead it provides a designer with a means to systematically explore various array architectures and evaluate design trade-offs between VLSI cost and performance. To allow a human designer better insight into the design process, ARREST uses the Xt/MOTIF window system for graphics and interfaces to the Cadence VERILOG simulator.<>
{"title":"ARREST: an interactive graphic analysis tool for VLSI arrays","authors":"W. Burleson, Bongjin Jung","doi":"10.1109/ASAP.1992.218575","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218575","url":null,"abstract":"The authors present a graphical CAD tool, Array Estimator (ARREST), for VLSI array architectures. In real VLSI arrays, piece-wise regular computations are spread across space and time and occur at a fine-grain, which can make visualization quite difficult. Consequently, a graphical interface environment is desirable to enhance the design, verification, and analysis of VLSI arrays by providing feedback at all levels of the design process. ARREST reads a high level description of structured VLSI algorithms in terms of affine recurrence equations (AREs) and permits a broad range of transformations on the algorithm. The system does not target a fully automated design process, instead it provides a designer with a means to systematically explore various array architectures and evaluate design trade-offs between VLSI cost and performance. To allow a human designer better insight into the design process, ARREST uses the Xt/MOTIF window system for graphics and interfaces to the Cadence VERILOG simulator.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115453575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-08-04DOI: 10.1109/ASAP.1992.218585
J. Teich, L. Thiele
The paper describes the systematic design of processor arrays with a given dimension and a given number of processing elements. The unified approach to the solution of this problem called partitioning is based on the following concepts: (1) Algorithms and processor arrays are represented by (piecewise regular) programs. (2) The concept of stepwise refinement of programs is used to solve the partitioning problem by applying a sequence of provably correct program transformations. In contrary to other approaches, nonperfect tilings may be considered. The parameters of the introduced program transformations enable the realization of different partitioning schemes. (3) It is shown that the class of piecewise regular programs is closed under partitioning.<>
{"title":"A transformative approach to the partitioning of processor arrays","authors":"J. Teich, L. Thiele","doi":"10.1109/ASAP.1992.218585","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218585","url":null,"abstract":"The paper describes the systematic design of processor arrays with a given dimension and a given number of processing elements. The unified approach to the solution of this problem called partitioning is based on the following concepts: (1) Algorithms and processor arrays are represented by (piecewise regular) programs. (2) The concept of stepwise refinement of programs is used to solve the partitioning problem by applying a sequence of provably correct program transformations. In contrary to other approaches, nonperfect tilings may be considered. The parameters of the introduced program transformations enable the realization of different partitioning schemes. (3) It is shown that the class of piecewise regular programs is closed under partitioning.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121753471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-08-04DOI: 10.1109/ASAP.1992.218546
Werner Pöchmüller, A. König, M. Glesner
Associative systems provide a flexibility ranging far beyond the scope of a conventional associative memory which simply provides a parallel search within a large amount of keywords to retrieve associated information. This paper presents several approaches to associative data processing. Algorithms are discussed that can easily be implemented or supported on an array computer. By means of dedicated VLSI chips a prototype array computer was implemented at Darmstadt University of Darmstadt. Together with simulations on conventional sequential computers, this array computer serves to prove the validity of developed algorithms on a running system.<>
{"title":"Associative information processing: algorithms and system","authors":"Werner Pöchmüller, A. König, M. Glesner","doi":"10.1109/ASAP.1992.218546","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218546","url":null,"abstract":"Associative systems provide a flexibility ranging far beyond the scope of a conventional associative memory which simply provides a parallel search within a large amount of keywords to retrieve associated information. This paper presents several approaches to associative data processing. Algorithms are discussed that can easily be implemented or supported on an array computer. By means of dedicated VLSI chips a prototype array computer was implemented at Darmstadt University of Darmstadt. Together with simulations on conventional sequential computers, this array computer serves to prove the validity of developed algorithms on a running system.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125212210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-08-04DOI: 10.1109/ASAP.1992.218576
D. Chen, L. Guerra, E. Ng, M. Potkonjak, D. P. Schultz, J. Rabaey
A system has been developed which targets the rapid prototyping of high performance data computation units which are typical to real-time digital signal processing applications. The hardware platform of the system is a family of multiprocessor integrated circuits. The prototype chip of this family contains 8 processors connected via a dynamically controlled crossbar switch. With a maximum clock rate of 25 MHz, it can support a computation rate of 200 MIPs and can sustain a data I/O bandwidth of 400 MByte/sec. An assembler and simulator provide low-level programmability of the hardware. A compiler which takes input described in the high-level data flow language Silage, and performs estimation, transformations, partitioning, assignment, and scheduling before generating assembly code, provides an automated software compilation path.<>
{"title":"An integrated system for rapid prototyping of high performance algorithm specific data paths","authors":"D. Chen, L. Guerra, E. Ng, M. Potkonjak, D. P. Schultz, J. Rabaey","doi":"10.1109/ASAP.1992.218576","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218576","url":null,"abstract":"A system has been developed which targets the rapid prototyping of high performance data computation units which are typical to real-time digital signal processing applications. The hardware platform of the system is a family of multiprocessor integrated circuits. The prototype chip of this family contains 8 processors connected via a dynamically controlled crossbar switch. With a maximum clock rate of 25 MHz, it can support a computation rate of 200 MIPs and can sustain a data I/O bandwidth of 400 MByte/sec. An assembler and simulator provide low-level programmability of the hardware. A compiler which takes input described in the high-level data flow language Silage, and performs estimation, transformations, partitioning, assignment, and scheduling before generating assembly code, provides an automated software compilation path.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125815005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-08-04DOI: 10.1109/ASAP.1992.218580
G. Jennings
The author considers the construction of synchronous systems having components driven at different rates by different, but commensurable, clocks. Furthermore these systems are to be constructed using level-sensitive latches with the intent of exploiting cycle borrowing over the entire system. The author presents a framework in which the entire system is managed as a single clocked entity, and investigates a timing analysis technique for such systems. Results for small examples are presented. The interface between such chips is studied; no resynchronizers are required. Alternate clock waveforms, and their effect on analysis complexity, are discussed.<>
{"title":"On cycle borrowing analyses for interconnected chips driven by clocks having different but commensurable speeds","authors":"G. Jennings","doi":"10.1109/ASAP.1992.218580","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218580","url":null,"abstract":"The author considers the construction of synchronous systems having components driven at different rates by different, but commensurable, clocks. Furthermore these systems are to be constructed using level-sensitive latches with the intent of exploiting cycle borrowing over the entire system. The author presents a framework in which the entire system is managed as a single clocked entity, and investigates a timing analysis technique for such systems. Results for small examples are presented. The interface between such chips is studied; no resynchronizers are required. Alternate clock waveforms, and their effect on analysis complexity, are discussed.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"86 22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131030784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-08-04DOI: 10.1109/ASAP.1992.218568
Y. Wu
Based on information theoretic principles, metrics for 'super performance' signal processing systems design are developed. Conventional super computer systems figures-of-merit in throughput measures, MFLOPS or GFLOPS, do not consider these basic metrics. The issue in the design of a signal processing system is efficiency rather than raw processing speed. The critical parameters to consider in designing a signal processing system are the available power and communications resources.<>
{"title":"On metrics of 'super performance' (signal processing systems)","authors":"Y. Wu","doi":"10.1109/ASAP.1992.218568","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218568","url":null,"abstract":"Based on information theoretic principles, metrics for 'super performance' signal processing systems design are developed. Conventional super computer systems figures-of-merit in throughput measures, MFLOPS or GFLOPS, do not consider these basic metrics. The issue in the design of a signal processing system is efficiency rather than raw processing speed. The critical parameters to consider in designing a signal processing system are the available power and communications resources.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122373561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-08-04DOI: 10.1109/ASAP.1992.218574
M. Potkonjak, J. Rabaey
A simple formulation of pipelining: 'Pipelining with N stages is equivalent to retiming where the number of delays on all inputs or all outputs, but not both, is increased by N' is used as the basis for a convenient and efficient treatment of pipelining in design of application specific computers. Classification of pipelining according to the optimization goal (throughput and resource utilization) and the latency is introduced. For polynomial complexity pipelining classes, optimal algorithms are presented. For other classes both proof of NP-completeness and efficient probabilistic algorithms are presented. Both theoretical and experimental properties of pipelining are discussed. In particular, a relationship with other transformations is explored. Due to close relationship between software pipelining and pipelining presented, all results can be easily modified for use in compilers for general purpose computers. Also, as a side result, the exact bound (solution) for iteration bound is derived.<>
{"title":"Pipelining: just another transformation","authors":"M. Potkonjak, J. Rabaey","doi":"10.1109/ASAP.1992.218574","DOIUrl":"https://doi.org/10.1109/ASAP.1992.218574","url":null,"abstract":"A simple formulation of pipelining: 'Pipelining with N stages is equivalent to retiming where the number of delays on all inputs or all outputs, but not both, is increased by N' is used as the basis for a convenient and efficient treatment of pipelining in design of application specific computers. Classification of pipelining according to the optimization goal (throughput and resource utilization) and the latency is introduced. For polynomial complexity pipelining classes, optimal algorithms are presented. For other classes both proof of NP-completeness and efficient probabilistic algorithms are presented. Both theoretical and experimental properties of pipelining are discussed. In particular, a relationship with other transformations is explored. Due to close relationship between software pipelining and pipelining presented, all results can be easily modified for use in compilers for general purpose computers. Also, as a side result, the exact bound (solution) for iteration bound is derived.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121750684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}