Pub Date : 1997-12-10DOI: 10.1109/ICAPP.1997.651479
Y. Tseng
In a hypercube or a star graph, given an unknown number of nodes located at unknown positions each intending to broadcast a message, we propose an efficient routing algorithm to solve this problem using asymptotically optimal or near-optimal transmission time.
{"title":"Multi-node broadcasting in hypercubes and star graphs","authors":"Y. Tseng","doi":"10.1109/ICAPP.1997.651479","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651479","url":null,"abstract":"In a hypercube or a star graph, given an unknown number of nodes located at unknown positions each intending to broadcast a message, we propose an efficient routing algorithm to solve this problem using asymptotically optimal or near-optimal transmission time.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126762586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-12-10DOI: 10.1109/ICAPP.1997.651534
K.H. Liu, C. Leung, Y. Jiang
Multiple processors are employed to improve the performance of database systems and the parallelism can be exploited at three levels in query processing: intra-operation, inter-operation, and inter-query parallelism. Intra-operation and inter-operation parallelism are also called intra-query parallelism which has been studied extensively. In contrast, inter-query parallelism has received little attention particularly for multiple dependent queries. We develop a decompression algorithm, CPS, for coping with multiple dependent queries which are represented by a directed graph, and the algorithm makes use of the activity analysis of critical path analysis, and the resource scheduling and levelling of project management. A simulation study has been conducted and the results show that the proposed algorithm outperforms other existing methods and is able to provide a global optimal solution when the number of processors available is sufficient.
{"title":"Multiple dependent queries execution using critical path scheduling in parallel databases","authors":"K.H. Liu, C. Leung, Y. Jiang","doi":"10.1109/ICAPP.1997.651534","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651534","url":null,"abstract":"Multiple processors are employed to improve the performance of database systems and the parallelism can be exploited at three levels in query processing: intra-operation, inter-operation, and inter-query parallelism. Intra-operation and inter-operation parallelism are also called intra-query parallelism which has been studied extensively. In contrast, inter-query parallelism has received little attention particularly for multiple dependent queries. We develop a decompression algorithm, CPS, for coping with multiple dependent queries which are represented by a directed graph, and the algorithm makes use of the activity analysis of critical path analysis, and the resource scheduling and levelling of project management. A simulation study has been conducted and the results show that the proposed algorithm outperforms other existing methods and is able to provide a global optimal solution when the number of processors available is sufficient.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132468532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-12-10DOI: 10.1109/ICAPP.1997.651508
Tsung-Chuan Huang, Po-Hsueh Hsu, Tze-Nan Sheng
We propose an efficient run-time technique to find an optimal parallel execution schedule for partially parallel loops in which synchronizations between iterations are needed to ensure correct program semantics. For efficiency, we combine conventional mark phase and scheduler phase into a single parallel scheduler. The scheduler divides the loop iterations into several chunks then executes the iterations in one chunk in parallel. Our scheme not only runs fast but also achieves an optimal schedule. In addition, an atomic bit-vector operation is introduced to avoid global synchronization overhead and ensure the larger wavefront number is kept when the wavefront number of an iteration will be concurrently updated during scheduling.
{"title":"Efficient run-time scheduling for parallelizing partially parallel loops","authors":"Tsung-Chuan Huang, Po-Hsueh Hsu, Tze-Nan Sheng","doi":"10.1109/ICAPP.1997.651508","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651508","url":null,"abstract":"We propose an efficient run-time technique to find an optimal parallel execution schedule for partially parallel loops in which synchronizations between iterations are needed to ensure correct program semantics. For efficiency, we combine conventional mark phase and scheduler phase into a single parallel scheduler. The scheduler divides the loop iterations into several chunks then executes the iterations in one chunk in parallel. Our scheme not only runs fast but also achieves an optimal schedule. In addition, an atomic bit-vector operation is introduced to avoid global synchronization overhead and ensure the larger wavefront number is kept when the wavefront number of an iteration will be concurrently updated during scheduling.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134458903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-12-10DOI: 10.1109/ICAPP.1997.651533
R. Tan, V. Lakshmi Narasimhan
In this paper, LSOM (Load-balancing Self-Organizing Map), a neural network based on Kohonen's self-organizing map is proposed for the problem of mapping finite-element method (FEM) grids to distributed-memory parallel computers with mesh interconnection networks. The rough global ordering produced by LSOM is then combined with the local refinement Kernighan-Lin algorithm (called LSOM-KL) to obtain the solution. LSOM-KL obtained a load imbalance of less than 0.1% and a low number of hops, comparable to results obtained with commonly used recursive bisection methods.
{"title":"Mapping of finite-element grids onto parallel computers using neural networks","authors":"R. Tan, V. Lakshmi Narasimhan","doi":"10.1109/ICAPP.1997.651533","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651533","url":null,"abstract":"In this paper, LSOM (Load-balancing Self-Organizing Map), a neural network based on Kohonen's self-organizing map is proposed for the problem of mapping finite-element method (FEM) grids to distributed-memory parallel computers with mesh interconnection networks. The rough global ordering produced by LSOM is then combined with the local refinement Kernighan-Lin algorithm (called LSOM-KL) to obtain the solution. LSOM-KL obtained a load imbalance of less than 0.1% and a low number of hops, comparable to results obtained with commonly used recursive bisection methods.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123872824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-12-10DOI: 10.1109/ICAPP.1997.651513
Weiping Zhu, Tyng-Yeu Liang, C. Shieh
To achieve high performance in a distributed system, the tasks of a program have to be carefully clustered and assigned to processors. In this paper we present a static method to cluster tasks and allocate them to processors. The proposed method relies on the Hopfield neural network to achieve optimum or near-optimum task clustering in terms of load balancing and communication cost. Experimental studies show that this method indeed can find optimal or near-optimal mapping for those programs used in our tests.
{"title":"Optimal task clustering using Hopfield net","authors":"Weiping Zhu, Tyng-Yeu Liang, C. Shieh","doi":"10.1109/ICAPP.1997.651513","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651513","url":null,"abstract":"To achieve high performance in a distributed system, the tasks of a program have to be carefully clustered and assigned to processors. In this paper we present a static method to cluster tasks and allocate them to processors. The proposed method relies on the Hopfield neural network to achieve optimum or near-optimum task clustering in terms of load balancing and communication cost. Experimental studies show that this method indeed can find optimal or near-optimal mapping for those programs used in our tests.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121280298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-12-10DOI: 10.1109/ICAPP.1997.651520
M. Bertozzi, A. Broggi, A. Fascioli
This paper presents a real-time solution to the problem of obstacle detection in automotive applications using image processing techniques. To speed-up the processing a massively parallel engine has been used and the algorithms tuned to match the specific features of the computing architecture. The system acquires pairs of stereo images, checks for correspondences, and remaps the resulting image in a new domain to ease the following processing steps. The whole processing is performed on PAPRICA-3, a massively parallel system whose processing elements are disposed on a linear array; the proposed system can reach video rate performance.
{"title":"Real-time obstacle detection on a massively parallel linear architecture","authors":"M. Bertozzi, A. Broggi, A. Fascioli","doi":"10.1109/ICAPP.1997.651520","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651520","url":null,"abstract":"This paper presents a real-time solution to the problem of obstacle detection in automotive applications using image processing techniques. To speed-up the processing a massively parallel engine has been used and the algorithms tuned to match the specific features of the computing architecture. The system acquires pairs of stereo images, checks for correspondences, and remaps the resulting image in a new domain to ease the following processing steps. The whole processing is performed on PAPRICA-3, a massively parallel system whose processing elements are disposed on a linear array; the proposed system can reach video rate performance.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114140025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-12-10DOI: 10.1109/ICAPP.1997.651495
H. Guyennet, J. Lapayre, M. Tréhel
We propose a new consistency protocol named Pilgrim for distributed shared memory (DSM) where different shared objects are replicated at each site. This protocol provides both reliable consistency and guaranteed performance. This protocol is discussed and proved with a finite state automaton, and we demonstrate its qualities.
{"title":"The Pilgrim: a new consistency protocol for distributed shared memory","authors":"H. Guyennet, J. Lapayre, M. Tréhel","doi":"10.1109/ICAPP.1997.651495","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651495","url":null,"abstract":"We propose a new consistency protocol named Pilgrim for distributed shared memory (DSM) where different shared objects are replicated at each site. This protocol provides both reliable consistency and guaranteed performance. This protocol is discussed and proved with a finite state automaton, and we demonstrate its qualities.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131937747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-12-10DOI: 10.1109/ICAPP.1997.651526
A. Zaknich, C.J.S. de Silva
The modified probabilistic neural network was initially derived from Specht's (1990) probabilistic neural network classifier and developed for nonlinear time series analysis. It can be described as a vector quantised reduced form of Specht's general regression neural network. It is typically trained with a known set of representative data pairs. This is quite satisfactory for stationary data statistics, but for the nonstationary case it is necessary to be able to adapt the network during operation. This paper describes adaptive learning schemes for the modified probabilistic neural network for both stationary and nonstationary data statistics. A nonlinear control problem is used to illustrate and compare the network's learning ability with that of the general regression and radial basis function neural networks.
{"title":"Adaptive learning schemes for the modified probabilistic neural network","authors":"A. Zaknich, C.J.S. de Silva","doi":"10.1109/ICAPP.1997.651526","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651526","url":null,"abstract":"The modified probabilistic neural network was initially derived from Specht's (1990) probabilistic neural network classifier and developed for nonlinear time series analysis. It can be described as a vector quantised reduced form of Specht's general regression neural network. It is typically trained with a known set of representative data pairs. This is quite satisfactory for stationary data statistics, but for the nonstationary case it is necessary to be able to adapt the network during operation. This paper describes adaptive learning schemes for the modified probabilistic neural network for both stationary and nonstationary data statistics. A nonlinear control problem is used to illustrate and compare the network's learning ability with that of the general regression and radial basis function neural networks.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129637626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-12-10DOI: 10.1109/ICAPP.1997.651483
J. Kneip
The paper describes concept and implementation of a data cache architecture with concurrent conflict free access to shared data for DSPs with parallel, synchronized processing units. It utilizes techniques known from object-oriented software design to achieve efficient and programmer friendly on-chip storage of data. The cache internally uses virtual 1D or 2D address spaces directly assigned to data structures instead of a conventional, linear address space. Data within the cache are distributed to a number of memory banks. Virtual local addresses are used for data location and hit/miss detection to minimize cost and memory latency. The object-oriented cache is fully transparent to programmer and compiler, reduces the amount of address calculations to be performed, exploits the 2D spatial locality typical for image processing algorithms and can be integrated into a standard RISC processor pipeline.
{"title":"An object-oriented data cache architecture for programmable parallel digital signal processors","authors":"J. Kneip","doi":"10.1109/ICAPP.1997.651483","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651483","url":null,"abstract":"The paper describes concept and implementation of a data cache architecture with concurrent conflict free access to shared data for DSPs with parallel, synchronized processing units. It utilizes techniques known from object-oriented software design to achieve efficient and programmer friendly on-chip storage of data. The cache internally uses virtual 1D or 2D address spaces directly assigned to data structures instead of a conventional, linear address space. Data within the cache are distributed to a number of memory banks. Virtual local addresses are used for data location and hit/miss detection to minimize cost and memory latency. The object-oriented cache is fully transparent to programmer and compiler, reduces the amount of address calculations to be performed, exploits the 2D spatial locality typical for image processing algorithms and can be integrated into a standard RISC processor pipeline.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129988558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-12-10DOI: 10.1109/ICAPP.1997.651515
D. Das, P. Dasgupta, P. Das
In this paper, we devise a new method for transparent fault tolerance of distributed programs running on a cluster of networked workstations. We use the concept of alternative schedules for this purpose. Such schedules are generated from static task graphs at compile-time. At run-time a distributed program can use these alternatives to switch from one schedule to another if some machine/s become faulty. We have devised fast but efficient mechanisms for switching among schedules at run-time. This enables fault recovery from any number of simultaneous machine faults any number of times. The correctness of the resultant algorithm is ensured through prevention of direct data sharing among local tasks on a machine. Such a transparent fault tolerant strategy is easily implementable on a network of workstations running PVM-like softwares.
{"title":"A new method for transparent fault tolerance of distributed programs on a network of workstations using alternative schedules","authors":"D. Das, P. Dasgupta, P. Das","doi":"10.1109/ICAPP.1997.651515","DOIUrl":"https://doi.org/10.1109/ICAPP.1997.651515","url":null,"abstract":"In this paper, we devise a new method for transparent fault tolerance of distributed programs running on a cluster of networked workstations. We use the concept of alternative schedules for this purpose. Such schedules are generated from static task graphs at compile-time. At run-time a distributed program can use these alternatives to switch from one schedule to another if some machine/s become faulty. We have devised fast but efficient mechanisms for switching among schedules at run-time. This enables fault recovery from any number of simultaneous machine faults any number of times. The correctness of the resultant algorithm is ensured through prevention of direct data sharing among local tasks on a machine. Such a transparent fault tolerant strategy is easily implementable on a network of workstations running PVM-like softwares.","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121977840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}