Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606826
D. Fimmel, R. Merker
In this paper the inclusion of hardware constraints into the design of massively parallel processor arrays is considered. We propose an algorithm which determines an optimal scheduling function as well as the selection of components which have to be implemented in one processor of a processor array. The arising optimization problem is formulated as an integer linear program which also takes the necessary chip area of a hardware implementation into consideration. Thereby we assume that an allocation function is given and that a partitioning of the processor array is required to match a limited chip area in silicon.
{"title":"Determination of the processor functionality in the design of processor arrays","authors":"D. Fimmel, R. Merker","doi":"10.1109/ASAP.1997.606826","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606826","url":null,"abstract":"In this paper the inclusion of hardware constraints into the design of massively parallel processor arrays is considered. We propose an algorithm which determines an optimal scheduling function as well as the selection of components which have to be implemented in one processor of a processor array. The arising optimization problem is formulated as an integer linear program which also takes the necessary chip area of a hardware implementation into consideration. Thereby we assume that an allocation function is given and that a partitioning of the processor array is required to match a limited chip area in silicon.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122300949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606812
E. Rijpkema, G. Hekstra, E. Deprettere, Jun Ma
In this paper we present a strategy for determining a dataflow processor which is intended for the execution of Jacobi algorithms which are found in the application domain of array processing and other real-lime adaptive signal processing applications. Our strategy to determine a processor for their execution is to exploit the quasi regularity property in their dependence graph representations in search for what we call the Jacobi processor. This processor emerges from an exploration iteration which takes off from a processor template and a set of Jacobi algorithms. Based on qualitative and quantitative performance analysis, both the algorithms and the processor template are restructured towards improved execution performance. To ensure the mapper is part of the emerging processor specification, the algorithm-to-processor mapping method is included in the iterative and hierarchical exploration method. Processor's hierarchy exploits properties related to regularity in the algorithm's structure, allows gentle transitions from regular to irregular levels in the algorithm hierarchy and offers different control models for the irregular structures that appear at deeper levels of the hierarchy. Transformations aiming at reducing critical paths, increasing throughput, improving mapping efficiency and minimizing control and flow overheads are essential. They include retiming, pipelining and lookahead techniques.
{"title":"A strategy for determining a Jacobi specific dataflow processor","authors":"E. Rijpkema, G. Hekstra, E. Deprettere, Jun Ma","doi":"10.1109/ASAP.1997.606812","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606812","url":null,"abstract":"In this paper we present a strategy for determining a dataflow processor which is intended for the execution of Jacobi algorithms which are found in the application domain of array processing and other real-lime adaptive signal processing applications. Our strategy to determine a processor for their execution is to exploit the quasi regularity property in their dependence graph representations in search for what we call the Jacobi processor. This processor emerges from an exploration iteration which takes off from a processor template and a set of Jacobi algorithms. Based on qualitative and quantitative performance analysis, both the algorithms and the processor template are restructured towards improved execution performance. To ensure the mapper is part of the emerging processor specification, the algorithm-to-processor mapping method is included in the iterative and hierarchical exploration method. Processor's hierarchy exploits properties related to regularity in the algorithm's structure, allows gentle transitions from regular to irregular levels in the algorithm hierarchy and offers different control models for the irregular structures that appear at deeper levels of the hierarchy. Transformations aiming at reducing critical paths, increasing throughput, improving mapping efficiency and minimizing control and flow overheads are essential. They include retiming, pipelining and lookahead techniques.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114683495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606835
Carsten Reuter, M. Schwiegershausen, P. Pirsch
We propose a novel stochastic approach for the problem of multiprocessor scheduling and allocation under timing and resource constraints using an evolutionary algorithm (EA). For composite schemes of DSP algorithms a compact problem encoding has been developed with emphasis on the allocation/binding part of the problem as well as an efficient problem transformation-decoding scheme in order to avoid infeasible solutions and therefore time consuming repair mechanisms. Thus, the algorithm is able to handle even large size problems within moderate computation time. Simulation results comparing the proposed EA with optimal results provided by mixed integer linear programming (MILP) show, that the EA is suitable to achieve the same or similar results but in much less time as problem size increases.
{"title":"Heterogeneous multiprocessor scheduling and allocation using evolutionary algorithms","authors":"Carsten Reuter, M. Schwiegershausen, P. Pirsch","doi":"10.1109/ASAP.1997.606835","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606835","url":null,"abstract":"We propose a novel stochastic approach for the problem of multiprocessor scheduling and allocation under timing and resource constraints using an evolutionary algorithm (EA). For composite schemes of DSP algorithms a compact problem encoding has been developed with emphasis on the allocation/binding part of the problem as well as an efficient problem transformation-decoding scheme in order to avoid infeasible solutions and therefore time consuming repair mechanisms. Thus, the algorithm is able to handle even large size problems within moderate computation time. Simulation results comparing the proposed EA with optimal results provided by mixed integer linear programming (MILP) show, that the EA is suitable to achieve the same or similar results but in much less time as problem size increases.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121876915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606840
Corinne Ancourt, Denis Barthou, C. Guettier, F. Irigoin, Bertrand Jeannet, J. Jourdan, J. Mattioli
This paper presents a technique to map automatically a complete digital signal processing (DSP) application onto a parallel machine with distributed memory. Unlike other applications where coarse or medium grain scheduling techniques can be used, DSP applications integrate several thousand of tasks and hence necessitate fine grain considerations. Moreover finding an effective mapping imperatively require to take into account both architectural resources constraints and real time constraints. The main contribution of this paper is to show how it is possible to handle and to solve data partitioning, and fine-grain scheduling under the above operational constraints using concurrent constraints logic programming languages (CCLP). Our concurrent resolution technique undertaking linear and nonlinear constraints takes advantage of the special features of signal processing applications and provides a solution equivalent to a manual solution for the representative panoramic analysis (PA) application.
{"title":"Automatic data mapping of signal processing applications","authors":"Corinne Ancourt, Denis Barthou, C. Guettier, F. Irigoin, Bertrand Jeannet, J. Jourdan, J. Mattioli","doi":"10.1109/ASAP.1997.606840","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606840","url":null,"abstract":"This paper presents a technique to map automatically a complete digital signal processing (DSP) application onto a parallel machine with distributed memory. Unlike other applications where coarse or medium grain scheduling techniques can be used, DSP applications integrate several thousand of tasks and hence necessitate fine grain considerations. Moreover finding an effective mapping imperatively require to take into account both architectural resources constraints and real time constraints. The main contribution of this paper is to show how it is possible to handle and to solve data partitioning, and fine-grain scheduling under the above operational constraints using concurrent constraints logic programming languages (CCLP). Our concurrent resolution technique undertaking linear and nonlinear constraints takes advantage of the special features of signal processing applications and provides a solution equivalent to a manual solution for the representative panoramic analysis (PA) application.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"318 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134067367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606816
Yeong-Kang Lai, Liang-Gee Chen, Yung-Pin Lee
This paper describes a data-interlacing architecture with two-dimensional (2-D) data-reuse for full-search block-matching algorithm. Based on some cascading strategies, the same chips can be flexibly cascaded for different block sizes, search ranges, and pixel rates. In addition, the cascading chips can efficiently reuse data to decrease external memory accesses and achieve a high throughput rate. Our results demonstrate that the architecture with 2-D data-reuse is a flexible, low-pin-counts, high-throughput, and cascadable solution for full search block-matching algorithm.
{"title":"A flexible data-interlacing architecture for full-search block-matching algorithm","authors":"Yeong-Kang Lai, Liang-Gee Chen, Yung-Pin Lee","doi":"10.1109/ASAP.1997.606816","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606816","url":null,"abstract":"This paper describes a data-interlacing architecture with two-dimensional (2-D) data-reuse for full-search block-matching algorithm. Based on some cascading strategies, the same chips can be flexibly cascaded for different block sizes, search ranges, and pixel rates. In addition, the cascading chips can efficiently reuse data to decrease external memory accesses and achieve a high throughput rate. Our results demonstrate that the architecture with 2-D data-reuse is a flexible, low-pin-counts, high-throughput, and cascadable solution for full search block-matching algorithm.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131159083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606827
R. Andonov, N. Yanev, H. Bourzoufi
We discuss in this paper the problem of finding the optimal tiling transformation of three-dimensional uniform recurrences on a two-dimensional torus/grid of distributed-memory general-purpose machines. We show that even for the simplest case of recurrences which allows for such transformation, the corresponding problem of minimizing the total running time is a non-trivial non-linear integer programming problem. For the later we derive an O(1) algorithm for finding a good approximation solution. The theoretical evaluations and the experimental results show that the obtained solution approximates the original minimum sufficiently well in the context of the considered problem. Such analytical results are of obvious interest and can be successfully used in parallelizing compilers as well as in performance tuning of parallel codes.
{"title":"Three-dimensional orthogonal tile sizing problem : mathematical programming approach","authors":"R. Andonov, N. Yanev, H. Bourzoufi","doi":"10.1109/ASAP.1997.606827","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606827","url":null,"abstract":"We discuss in this paper the problem of finding the optimal tiling transformation of three-dimensional uniform recurrences on a two-dimensional torus/grid of distributed-memory general-purpose machines. We show that even for the simplest case of recurrences which allows for such transformation, the corresponding problem of minimizing the total running time is a non-trivial non-linear integer programming problem. For the later we derive an O(1) algorithm for finding a good approximation solution. The theoretical evaluations and the experimental results show that the obtained solution approximates the original minimum sufficiently well in the context of the considered problem. Such analytical results are of obvious interest and can be successfully used in parallelizing compilers as well as in performance tuning of parallel codes.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133273455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606821
M. Schulte, J. Stine
This paper presents a high-speed method for accurate function approximations. This method employs parallel table lookups followed by multi-operand addition. It takes advantage of leading zeros and symmetry in the table entries to reduce the table sizes. By increasing the number of tables and the number of operands in the multi-operand addition, the amount of memory is significantly reduced. This method provides a closed form solution for the table entries and can be applied to a variety of elementary functions. Compared to conventional table lookups, it requires two to three orders of magnitude less memory. The design of elementary function generators that use this method are presented and compared to similar methods for elementary function generation.
{"title":"Accurate function approximations by symmetric table lookup and addition","authors":"M. Schulte, J. Stine","doi":"10.1109/ASAP.1997.606821","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606821","url":null,"abstract":"This paper presents a high-speed method for accurate function approximations. This method employs parallel table lookups followed by multi-operand addition. It takes advantage of leading zeros and symmetry in the table entries to reduce the table sizes. By increasing the number of tables and the number of operands in the multi-operand addition, the amount of memory is significantly reduced. This method provides a closed form solution for the table entries and can be applied to a variety of elementary functions. Compared to conventional table lookups, it requires two to three orders of magnitude less memory. The design of elementary function generators that use this method are presented and compared to similar methods for elementary function generation.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127497882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606824
P. Pirsch, H. Stolberg
An overview on architectures for implementations of current video compression schemes is given. Dedicated as well as programmable approaches are discussed. Examples for dedicated function-specific implementations include architectures for DCT and block matching. For programmable video signal processors, a number of architectural measures to increase video compression performance are reviewed. Actual implementations of video compression schemes typically employ a variety of different architectural approaches. The detailed mix of approaches depends on the targeted application spectrum.
{"title":"Architectural approaches for video compression","authors":"P. Pirsch, H. Stolberg","doi":"10.1109/ASAP.1997.606824","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606824","url":null,"abstract":"An overview on architectures for implementations of current video compression schemes is given. Dedicated as well as programmable approaches are discussed. Examples for dedicated function-specific implementations include architectures for DCT and block matching. For programmable video signal processors, a number of architectural measures to increase video compression performance are reviewed. Actual implementations of video compression schemes typically employ a variety of different architectural approaches. The detailed mix of approaches depends on the targeted application spectrum.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127876731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/ASAP.1997.606830
F. de Dinechin
This paper presents a method, based on the formalism of affine recurrence equations, for the synthesis of digital circuits exploiting parallelism at the bit-level. In the initial specification of a numerical algorithm, the arithmetic operators are replaced with their yet unscheduled (schedule-free) binary implementation as recurrence equations. This allows a bit-level dependency analysis yielding a bit-parallel array. The method is demonstrated on the example of the matrix-vector product, and discussed.
{"title":"Libraries of schedule-free operators in Alpha","authors":"F. de Dinechin","doi":"10.1109/ASAP.1997.606830","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606830","url":null,"abstract":"This paper presents a method, based on the formalism of affine recurrence equations, for the synthesis of digital circuits exploiting parallelism at the bit-level. In the initial specification of a numerical algorithm, the arithmetic operators are replaced with their yet unscheduled (schedule-free) binary implementation as recurrence equations. This allows a bit-level dependency analysis yielding a bit-parallel array. The method is demonstrated on the example of the matrix-vector product, and discussed.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122575552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}