Pub Date : 1999-04-12DOI: 10.1142/S0129626405002179
M. Bernaschi, G. Iannello, S. Crea
Collective communication performance is critical in a number of MPI applications, yet relatively few results are available to assess the performance of mainstream MPI implementations. In this paper we focus on two widely used primitives, broadcast and reduce, and present experimental results for the Cray T3E and the IBM SP2. We compare the performance of the existing MPI primitives with our implementation based on a new algorithm. Our tests show that existing all-software implementations can be improved and highlight the advantages of the Cray hardware-assisted implementation.
{"title":"Experimental Results about Mpi Collective Communication Operations","authors":"M. Bernaschi, G. Iannello, S. Crea","doi":"10.1142/S0129626405002179","DOIUrl":"https://doi.org/10.1142/S0129626405002179","url":null,"abstract":"Collective communication performance is critical in a number of MPI applications, yet relatively few results are available to assess the performance of mainstream MPI implementations. In this paper we focus on two widely used primitives, broadcast and reduce, and present experimental results for the Cray T3E and the IBM SP2. We compare the performance of the existing MPI primitives with our implementation based on a new algorithm. Our tests show that existing all-software implementations can be improved and highlight the advantages of the Cray hardware-assisted implementation.","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"32 1","pages":"774-783"},"PeriodicalIF":0.4,"publicationDate":"1999-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77791039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-02-25DOI: 10.1142/S0129626401000622
Mohamadou A. Diallo, Afonso Ferreira, A. Rau-Chaplin
In this note we describe deterministic parallel algorithms for planar point location and for building the Voronoi Diagram of n co-planar points. These algorithms are designed for BSP/CGM-like models of computation, where p processors, with local memory each, communicate through some arbitrary interconnection network. They are communication-efficient since they require, respectively, O(1) and O(log p) communication steps and local computation per step. Both algorithms require local memory.
{"title":"A Note on Communication-Efficient Deterministic Parallel Algorithms for Planar Point Location and 2D Voronoï Diagram","authors":"Mohamadou A. Diallo, Afonso Ferreira, A. Rau-Chaplin","doi":"10.1142/S0129626401000622","DOIUrl":"https://doi.org/10.1142/S0129626401000622","url":null,"abstract":"In this note we describe deterministic parallel algorithms for planar point location and for building the Voronoi Diagram of n co-planar points. These algorithms are designed for BSP/CGM-like models of computation, where p processors, with local memory each, communicate through some arbitrary interconnection network. They are communication-efficient since they require, respectively, O(1) and O(log p) communication steps and local computation per step. Both algorithms require local memory.","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"855 1","pages":"399-409"},"PeriodicalIF":0.4,"publicationDate":"1998-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84262655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-08-26DOI: 10.1142/S0129626400000287
M. D. Ianni
Deadlock prevention is usually realized by imposing strong restrictions on packet transmissions in the network so that the resulting deadlock free routing algorithms are not optimal with respect to resources utilization. Optimality request can be satisfied by forbidding transmissions only when they would bring the network into a configuration that will necessarily evolve into a deadlock. Hence, optimal deadlock avoidance is closely related to deadlock prediction. In this paper it is shown that wormhole deadlock prediction is an hard problem. Such result is proved with respect to both static and dynamic routing.
{"title":"Wormhole Deadlock Prediction","authors":"M. D. Ianni","doi":"10.1142/S0129626400000287","DOIUrl":"https://doi.org/10.1142/S0129626400000287","url":null,"abstract":"Deadlock prevention is usually realized by imposing strong restrictions on packet transmissions in the network so that the resulting deadlock free routing algorithms are not optimal with respect to resources utilization. Optimality request can be satisfied by forbidding transmissions only when they would bring the network into a configuration that will necessarily evolve into a deadlock. Hence, optimal deadlock avoidance is closely related to deadlock prediction. In this paper it is shown that wormhole deadlock prediction is an hard problem. Such result is proved with respect to both static and dynamic routing.","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"2 1","pages":"188-195"},"PeriodicalIF":0.4,"publicationDate":"1997-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76631180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-08-26DOI: 10.1142/S0129626497000140
J. Collard, M. Griebl
This paper describes a dataflow analysis of array data structures for data-parallel and/or control- (or task-) parallel imperative languages. This analysis departs from previous work because it 1) simultaneously handles both parallel programming paradigms, and 2) does not rely on the usual iterative solving process of a set of data flow equations but extends array dataflow analysis based on integer linear programming, thus improving the precision of results.
{"title":"Array Dataflow Analysis for Explicitly Parallel Programs","authors":"J. Collard, M. Griebl","doi":"10.1142/S0129626497000140","DOIUrl":"https://doi.org/10.1142/S0129626497000140","url":null,"abstract":"This paper describes a dataflow analysis of array data structures for data-parallel and/or control- (or task-) parallel imperative languages. This analysis departs from previous work because it 1) simultaneously handles both parallel programming paradigms, and 2) does not rely on the usual iterative solving process of a set of data flow equations but extends array dataflow analysis based on integer linear programming, thus improving the precision of results.","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"2 1","pages":"406-413"},"PeriodicalIF":0.4,"publicationDate":"1996-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86389031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-12-01DOI: 10.1007/978-4-431-68456-5_42
L. K. Swift, T. Johnson, P. Livadas
{"title":"Parallel Creation of Linear Octrees from Quadtree Slices","authors":"L. K. Swift, T. Johnson, P. Livadas","doi":"10.1007/978-4-431-68456-5_42","DOIUrl":"https://doi.org/10.1007/978-4-431-68456-5_42","url":null,"abstract":"","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"04 1","pages":"519-522"},"PeriodicalIF":0.4,"publicationDate":"1994-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"51436343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-07-06DOI: 10.1142/S0129626497000334
M. Papatriantafilou, P. Tsigas
Clock synchronization algorithms which can tolerate any number of processors that can fail by ceasing operation for an unbounded number of steps and resuming operation (with or) without knowing that they were faulty are called Wait-Free. Furthermore, if these algorithms are also able to work correctly even when the starting state of the system is arbitrary, they are called Wait-Free, Self-Stabilizing. This work deals with the problem of Wait-Free, Self-Stabilizing Clock Synchronization of n processors in an “in-phase” multiprocessor system and presents a solution with quadratic synchronization time. The best previous solution has cubic synchronization time. The idea of the algorithm is based on a simple analysis of the difficulties of the problem which helped us to see how to “re-parametrize” the cubic previously mentioned algorithm in order to get the quadratic synchronization time solution. Both the protocol and its analysis are intuitive and easy to understand.
{"title":"On Self-Stabilizing Wait-Free Clock Synchronization","authors":"M. Papatriantafilou, P. Tsigas","doi":"10.1142/S0129626497000334","DOIUrl":"https://doi.org/10.1142/S0129626497000334","url":null,"abstract":"Clock synchronization algorithms which can tolerate any number of processors that can fail by ceasing operation for an unbounded number of steps and resuming operation (with or) without knowing that they were faulty are called Wait-Free. Furthermore, if these algorithms are also able to work correctly even when the starting state of the system is arbitrary, they are called Wait-Free, Self-Stabilizing. This work deals with the problem of Wait-Free, Self-Stabilizing Clock Synchronization of n processors in an “in-phase” multiprocessor system and presents a solution with quadratic synchronization time. The best previous solution has cubic synchronization time. The idea of the algorithm is based on a simple analysis of the difficulties of the problem which helped us to see how to “re-parametrize” the cubic previously mentioned algorithm in order to get the quadratic synchronization time solution. Both the protocol and its analysis are intuitive and easy to understand.","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"27 1","pages":"267-277"},"PeriodicalIF":0.4,"publicationDate":"1994-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83046139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-06-01DOI: 10.1142/S0129626494000181
R. Ramesh
Term rewriting is a popular computational paradigm for symbolic computations such as formula manipulation, theorem proving and implementations of nonprocedural programming languages. In rewriting, the most demanding operation is repeated simplification of terms by pattern matching them against rewrite rules. We describe a parallel architecture, R2M, for accelerating this operation. R2M can operate either as a stand-alone processor using its own memory or as a backend device attached to a host using the host’s main memory. R2M uses only a fixed number (independent of input size) of processing units and fixed capacity auxiliary memory units, yet it is capable of handling variable-size rewrite rules that change during simplification. This is made possible by a simple and reconfigurable interconnection present in R2M. Finally, R2M uses a hybrid scheme that combines the ease, and efficiency of parallel pattern matching using the tree representation of terms, and the naturalness of their dag representation for replacements.
{"title":"R2M: A RECONFIGURABLE REWRITE MACHINE","authors":"R. Ramesh","doi":"10.1142/S0129626494000181","DOIUrl":"https://doi.org/10.1142/S0129626494000181","url":null,"abstract":"Term rewriting is a popular computational paradigm for symbolic computations such as formula manipulation, theorem proving and implementations of nonprocedural programming languages. In rewriting, the most demanding operation is repeated simplification of terms by pattern matching them against rewrite rules. We describe a parallel architecture, R2M, for accelerating this operation. R2M can operate either as a stand-alone processor using its own memory or as a backend device attached to a host using the host’s main memory. R2M uses only a fixed number (independent of input size) of processing units and fixed capacity auxiliary memory units, yet it is capable of handling variable-size rewrite rules that change during simplification. This is made possible by a simple and reconfigurable interconnection present in R2M. Finally, R2M uses a hybrid scheme that combines the ease, and efficiency of parallel pattern matching using the tree representation of terms, and the naturalness of their dag representation for replacements.","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"79 1","pages":"171-180"},"PeriodicalIF":0.4,"publicationDate":"1994-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83127511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-09-13DOI: 10.1142/S0129626494000144
J. Duato
Fault-tolerant systems aim at providing continuous operations in the presence of faults. Multicomputers rely on an interconnection network between processors to support the message-passing mechanism. Therefore, the reliability of the interconnection network is very important for the reliability of the whole system. This paper analyses the effective redundancy available in a wormhole network by combining connectivity and deadlock freedom. Redundancy is defined at the channel level, giving a sufficient condition for a channel to be redundant and computing the set of redundant channels. The redundancy level of the network is also defined, proposing a theorem that supplies a lower bound for it. Finally, a fault-tolerant routing algorithm based on the former theory is proposed.
{"title":"A Theory to Increase the Effective Redundancy in Wormhole Networks","authors":"J. Duato","doi":"10.1142/S0129626494000144","DOIUrl":"https://doi.org/10.1142/S0129626494000144","url":null,"abstract":"Fault-tolerant systems aim at providing continuous operations in the presence of faults. Multicomputers rely on an interconnection network between processors to support the message-passing mechanism. Therefore, the reliability of the interconnection network is very important for the reliability of the whole system. This paper analyses the effective redundancy available in a wormhole network by combining connectivity and deadlock freedom. Redundancy is defined at the channel level, giving a sufficient condition for a channel to be redundant and computing the set of redundant channels. The redundancy level of the network is also defined, proposing a theorem that supplies a lower bound for it. Finally, a fault-tolerant routing algorithm based on the former theory is proposed.","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"117 1","pages":"277-288"},"PeriodicalIF":0.4,"publicationDate":"1993-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86193194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-03-01DOI: 10.1142/S0129626493000071
J. Glasa
Two different bit-level systolic arrays for digital contour smoothing by Abel-Poisson kernel which minimize the execution time and the number of functional elements required are suggested. The arrays are fully pipelined on the bit-level achieving very high clock frequency. They are implementable in VLSI and are dedicated for real-time applications.
{"title":"Bit-Level Systolic Arrays for Digital Contour Smoothing by Abel-Poisson Kernel","authors":"J. Glasa","doi":"10.1142/S0129626493000071","DOIUrl":"https://doi.org/10.1142/S0129626493000071","url":null,"abstract":"Two different bit-level systolic arrays for digital contour smoothing by Abel-Poisson kernel which minimize the execution time and the number of functional elements required are suggested. The arrays are fully pipelined on the bit-level achieving very high clock frequency. They are implementable in VLSI and are dedicated for real-time applications.","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"8 1","pages":"105-120"},"PeriodicalIF":0.4,"publicationDate":"1993-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75183782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-03-01DOI: 10.1142/S012962649300006X
Charles Henri-Pierre, Fraigniaud Pierre
The scattering problem refers to the gossiping and the broadcasting problems [1, 2]. It consists in distributing a set of data from a single source such that each component is sent to a distinct address. The gathering operation is the reverse of the scattering operation. This paper studies the problem of pipelining a scattering-gathering sequence in order to overlap these operations. We first give a general solution for distributed memory parallel computers, and next we particularly study this problem on hypercubes.
{"title":"SCHEDULING A SCATTERING-GATHERING SEQUENCE ON HYPERCUBES","authors":"Charles Henri-Pierre, Fraigniaud Pierre","doi":"10.1142/S012962649300006X","DOIUrl":"https://doi.org/10.1142/S012962649300006X","url":null,"abstract":"The scattering problem refers to the gossiping and the broadcasting problems [1, 2]. It consists in distributing a set of data from a single source such that each component is sent to a distinct address. The gathering operation is the reverse of the scattering operation. This paper studies the problem of pipelining a scattering-gathering sequence in order to overlap these operations. We first give a general solution for distributed memory parallel computers, and next we particularly study this problem on hypercubes.","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"109 1","pages":"29-42"},"PeriodicalIF":0.4,"publicationDate":"1993-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80849081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}