Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574031
D. Feng, Xinrong Zhou, Hai Jin, Jiangling Zhang
A stochastic Petri nets (SPN) model of RAID-5 is constructed. With the model and its isomorphic Markov chain, the average utilization of disk drives in RAID for small write and large I/O request can be calculated. It provides us a good method to evaluate the performance of RAID in the paper.
{"title":"Utilization of disk drives for RAID","authors":"D. Feng, Xinrong Zhou, Hai Jin, Jiangling Zhang","doi":"10.1109/APDC.1997.574031","DOIUrl":"https://doi.org/10.1109/APDC.1997.574031","url":null,"abstract":"A stochastic Petri nets (SPN) model of RAID-5 is constructed. With the model and its isomorphic Markov chain, the average utilization of disk drives in RAID for small write and large I/O request can be calculated. It provides us a good method to evaluate the performance of RAID in the paper.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127907044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574028
Hong Shen
We present a fast parallel algorithm running in O(log/sup 2/n) time on a CREW PRAM with O(n) processors for finding the kth longest path in a given tree of n vertices (with /spl Theta/(n/sup 2/) intervertex distances). Our algorithm is obtained by efficient parallelization of a sequential algorithm which is a variant of both N. Megiddo et al.'s algorithm and G.N. Fredrickson et al.'s algorithm based on centroid decomposition of tree and succinct representation of the set of intervertex distances. With the same time and space bound as the best known result, our sequential algorithm maintains a shorter length of the decomposition tree.
{"title":"Fast parallel algorithm for finding the kth longest path in a tree","authors":"Hong Shen","doi":"10.1109/APDC.1997.574028","DOIUrl":"https://doi.org/10.1109/APDC.1997.574028","url":null,"abstract":"We present a fast parallel algorithm running in O(log/sup 2/n) time on a CREW PRAM with O(n) processors for finding the kth longest path in a given tree of n vertices (with /spl Theta/(n/sup 2/) intervertex distances). Our algorithm is obtained by efficient parallelization of a sequential algorithm which is a variant of both N. Megiddo et al.'s algorithm and G.N. Fredrickson et al.'s algorithm based on centroid decomposition of tree and succinct representation of the set of intervertex distances. With the same time and space bound as the best known result, our sequential algorithm maintains a shorter length of the decomposition tree.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131715595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574042
Jiannong Cao, W. Jia, X. Jia, T. Cheung
A synchronous checkpointing algorithm coordinates a set of processes in taking checkpoints in such a way that the set of local checkpoints always forms part of a consistent global system state. Whenever a process p requests to take a checkpoint, a set of processes, called the cohorts set of p, must be checked and some of them may also have to take their checkpoints in order to preserve system consistency. Although several synchronous checkpointing algorithms have been proposed in the literature, most of them do not address the performance issue. In this paper we propose an efficient distributed algorithm for synchronous checkpointing. Proof of correctness and analysis of efficiency of the algorithm are presented. It is shown that the algorithm has a better message and time complexity than the existing algorithms. The method proposed in this paper can also be applied to enhance the performance of rollback operation which always require synchronization of the inter-dependent processes.
{"title":"Design and analysis of an efficient algorithm for coordinated checkpointing in distributed systems","authors":"Jiannong Cao, W. Jia, X. Jia, T. Cheung","doi":"10.1109/APDC.1997.574042","DOIUrl":"https://doi.org/10.1109/APDC.1997.574042","url":null,"abstract":"A synchronous checkpointing algorithm coordinates a set of processes in taking checkpoints in such a way that the set of local checkpoints always forms part of a consistent global system state. Whenever a process p requests to take a checkpoint, a set of processes, called the cohorts set of p, must be checked and some of them may also have to take their checkpoints in order to preserve system consistency. Although several synchronous checkpointing algorithms have been proposed in the literature, most of them do not address the performance issue. In this paper we propose an efficient distributed algorithm for synchronous checkpointing. Proof of correctness and analysis of efficiency of the algorithm are presented. It is shown that the algorithm has a better message and time complexity than the existing algorithms. The method proposed in this paper can also be applied to enhance the performance of rollback operation which always require synchronization of the inter-dependent processes.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126361382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574032
Y. Dou, Zhengbin Pang, Xingming Zhou
This paper introduces a software virtual shared memory, GKD-VSM on PVM. It provides a shared memory parallel programming model in FORTRAN language for distributed memory environments. To reduce the software overhead GKD-VSM takes several approaches, including special-purposed user-level multithread scheme and Prefetch&Poststore at synchronization points scheme. The latencies for basic operations are presented.
{"title":"Implementing a software virtual shared memory on PVM","authors":"Y. Dou, Zhengbin Pang, Xingming Zhou","doi":"10.1109/APDC.1997.574032","DOIUrl":"https://doi.org/10.1109/APDC.1997.574032","url":null,"abstract":"This paper introduces a software virtual shared memory, GKD-VSM on PVM. It provides a shared memory parallel programming model in FORTRAN language for distributed memory environments. To reduce the software overhead GKD-VSM takes several approaches, including special-purposed user-level multithread scheme and Prefetch&Poststore at synchronization points scheme. The latencies for basic operations are presented.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127594644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574023
Sangyoon Lee, Chan-Ik Park, Chan-Mo Park
Delaunay triangulation has been much used in such applications as volume rendering, shape representation, terrain modeling and so on. The main disadvantage of Delaunay triangulation is large computation time required to obtain the triangulation on an input points set. This time can be reduced by using more than one processor, and several parallel algorithms for Delaunay triangulation have been proposed. In this paper, we propose an improved parallel algorithm for Delaunay triangulation, which partitions the bounding convex region of the input points set into a number of regions by using Delaunay edges and generates Delaunay triangles in each region by applying an incremental construction approach. Partitioning by Delaunay edges makes it possible to eliminate merging step required for integrating subresults. It is shown from the experiments that the proposed algorithm has good load balance and is more efficient than Cignoni et al.'s algorithm (1993) and our previous algorithm (1996).
{"title":"An improved parallel algorithm for Delaunay triangulation on distributed memory parallel computers","authors":"Sangyoon Lee, Chan-Ik Park, Chan-Mo Park","doi":"10.1109/APDC.1997.574023","DOIUrl":"https://doi.org/10.1109/APDC.1997.574023","url":null,"abstract":"Delaunay triangulation has been much used in such applications as volume rendering, shape representation, terrain modeling and so on. The main disadvantage of Delaunay triangulation is large computation time required to obtain the triangulation on an input points set. This time can be reduced by using more than one processor, and several parallel algorithms for Delaunay triangulation have been proposed. In this paper, we propose an improved parallel algorithm for Delaunay triangulation, which partitions the bounding convex region of the input points set into a number of regions by using Delaunay edges and generates Delaunay triangles in each region by applying an incremental construction approach. Partitioning by Delaunay edges makes it possible to eliminate merging step required for integrating subresults. It is shown from the experiments that the proposed algorithm has good load balance and is more efficient than Cignoni et al.'s algorithm (1993) and our previous algorithm (1996).","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127820128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574017
Weigang Yuan, Yongqiang Sun
This paper presents a new structured parallel programming model, "SEQ OF PAR", based on the Communication Closed Layer (CCL) principle of causal composition for parallel programs and Bird-Meertens formalism (BMF) of locality-based parallel computation. This model is to support for more general, architecture-independent parallel programming. It provides a structured approach to integrate task (or process) parallelism and data-parallelism in one framework. The well-founded algebra of CCL and BMF makes it also possible to derive, optimize and verify parallel programs through algebraic transformations. Experimental results show that it is very promising to adopt this programming model for getting efficient, portable parallel code.
基于并行程序的通信封闭层(CCL)因果组合原理和基于位置的并行计算的Bird-Meertens形式化(BMF),提出了一种新的结构化并行规划模型“SEQ OF PAR”。该模型支持更通用的、与体系结构无关的并行编程。它提供了一种结构化的方法,将任务(或流程)并行性和数据并行性集成到一个框架中。CCL和BMF的良好代数基础使得通过代数变换推导、优化和验证并行程序成为可能。实验结果表明,采用该编程模型可以获得高效、可移植的并行代码。
{"title":"\"SEQ OF PAR\" style structured parallel programming","authors":"Weigang Yuan, Yongqiang Sun","doi":"10.1109/APDC.1997.574017","DOIUrl":"https://doi.org/10.1109/APDC.1997.574017","url":null,"abstract":"This paper presents a new structured parallel programming model, \"SEQ OF PAR\", based on the Communication Closed Layer (CCL) principle of causal composition for parallel programs and Bird-Meertens formalism (BMF) of locality-based parallel computation. This model is to support for more general, architecture-independent parallel programming. It provides a structured approach to integrate task (or process) parallelism and data-parallelism in one framework. The well-founded algebra of CCL and BMF makes it also possible to derive, optimize and verify parallel programs through algebraic transformations. Experimental results show that it is very promising to adopt this programming model for getting efficient, portable parallel code.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133124104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574033
Roger Butenuth
Although highly parallel distributed memory computers exist for several years, the operating systems used on them did not fit the requirements very well. Most of them are designed for sequential, shared memory parallel or distributed computers. Examples are Unix on the IBM SP/2 and Mach on the Intel Paragon. This results in poor scalability caused by inefficient communication primitives designed for wide area networks or by waste of resources due to huge kernels (e.g. 8 MB per node are reported for Mach an the Paragon, which is harmful especially in highly parallel systems with hundreds or thousands of nodes. With Cosy (Concurrent Operating System) we have shown that a well structured and carefully designed system can be small (70 Kb for the kernel 372 total memory usage per node), efficient (33 /spl mu/s for communication), and scalable (applications run efficient on up to 1024 processors).
{"title":"Small, scalable, and efficient, microkernels for highly parallel computers are possible: Cosy as an example","authors":"Roger Butenuth","doi":"10.1109/APDC.1997.574033","DOIUrl":"https://doi.org/10.1109/APDC.1997.574033","url":null,"abstract":"Although highly parallel distributed memory computers exist for several years, the operating systems used on them did not fit the requirements very well. Most of them are designed for sequential, shared memory parallel or distributed computers. Examples are Unix on the IBM SP/2 and Mach on the Intel Paragon. This results in poor scalability caused by inefficient communication primitives designed for wide area networks or by waste of resources due to huge kernels (e.g. 8 MB per node are reported for Mach an the Paragon, which is harmful especially in highly parallel systems with hundreds or thousands of nodes. With Cosy (Concurrent Operating System) we have shown that a well structured and carefully designed system can be small (70 Kb for the kernel 372 total memory usage per node), efficient (33 /spl mu/s for communication), and scalable (applications run efficient on up to 1024 processors).","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124673280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574010
Huiwei Guan, T. Cheung, Chi-Kwong Li, Songnian Yu
A parallel design and implementation of the Self-Organizing Map (SOM) neural computing model is proposed. The parallel design of SOM is implemented in a parallel virtual machine (PVM) environment of a distributed system. A practical realization of SOM algorithm is investigated, the construction of computing module in parallel virtual machine is discussed, the communication methods and an optimization of message passing between multiple processes are proposed, and the parallel programming technique and a PVM implementation of SOM neural computing model are given and discussed in detail.
{"title":"Parallel design and implementation of SOM neural computing model in PVM environment of a distributed system","authors":"Huiwei Guan, T. Cheung, Chi-Kwong Li, Songnian Yu","doi":"10.1109/APDC.1997.574010","DOIUrl":"https://doi.org/10.1109/APDC.1997.574010","url":null,"abstract":"A parallel design and implementation of the Self-Organizing Map (SOM) neural computing model is proposed. The parallel design of SOM is implemented in a parallel virtual machine (PVM) environment of a distributed system. A practical realization of SOM algorithm is investigated, the construction of computing module in parallel virtual machine is discussed, the communication methods and an optimization of message passing between multiple processes are proposed, and the parallel programming technique and a PVM implementation of SOM neural computing model are given and discussed in detail.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116443162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574044
Xinda Lu, Qianni Deng, Fei Zheng
In this paper, an approach using least squares method for Callback implementation is presented. Experiments on Callback approved that Callback was not only effective in high-level metasystems but also in low level heterogeneous system.
{"title":"Experiments on heterogeneous scheduling via Callback","authors":"Xinda Lu, Qianni Deng, Fei Zheng","doi":"10.1109/APDC.1997.574044","DOIUrl":"https://doi.org/10.1109/APDC.1997.574044","url":null,"abstract":"In this paper, an approach using least squares method for Callback implementation is presented. Experiments on Callback approved that Callback was not only effective in high-level metasystems but also in low level heterogeneous system.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126838098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574058
Chihong Zhang, Zhizhong Tang
The accuracy of the data dependence analysis of a client program will decide in what an extent the compiler can unleash the power of the potential parallelism of the client program. Most of the current works on dependence analysis are based on the dependence equation and constraint inequalities of loop variable bounds (sometimes augmented with the direction vector). Unfortunately, they can not give an exact detection on the dependence which may greatly affect the parallel optimization of the client program when software pipelining technique is employed. In the paper, we give a more effective constraint inequality which could reflect the characteristics of software pipelining technique and will improve the power of dependence analysis of most of the current algorithms when applied to software pipelining.
{"title":"An improvement on data dependence analysis supporting software pipelining technique","authors":"Chihong Zhang, Zhizhong Tang","doi":"10.1109/APDC.1997.574058","DOIUrl":"https://doi.org/10.1109/APDC.1997.574058","url":null,"abstract":"The accuracy of the data dependence analysis of a client program will decide in what an extent the compiler can unleash the power of the potential parallelism of the client program. Most of the current works on dependence analysis are based on the dependence equation and constraint inequalities of loop variable bounds (sometimes augmented with the direction vector). Unfortunately, they can not give an exact detection on the dependence which may greatly affect the parallel optimization of the client program when software pipelining technique is employed. In the paper, we give a more effective constraint inequality which could reflect the characteristics of software pipelining technique and will improve the power of dependence analysis of most of the current algorithms when applied to software pipelining.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126383898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}