Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574038
Yuzhong Sun, Jianyong Wang, Zhiwei Xu
This paper characterizes the structure and resource requirements of the NAS Parallel Benchmarks (NPB), a popular benchmark suite used to evaluate various parallel computers. The phase parallel model is used to obtain parameter values for memory, I/O, and communication latency and bandwidth requirements. These quantitative parameters are useful in the design and evaluation of various parallel computers. The results of this study is being used in designing Dawning 2000, which is NCIC's second generation MPP.
{"title":"Architectural implications of the NAS MG and FT parallel benchmarks","authors":"Yuzhong Sun, Jianyong Wang, Zhiwei Xu","doi":"10.1109/APDC.1997.574038","DOIUrl":"https://doi.org/10.1109/APDC.1997.574038","url":null,"abstract":"This paper characterizes the structure and resource requirements of the NAS Parallel Benchmarks (NPB), a popular benchmark suite used to evaluate various parallel computers. The phase parallel model is used to obtain parameter values for memory, I/O, and communication latency and bandwidth requirements. These quantitative parameters are useful in the design and evaluation of various parallel computers. The results of this study is being used in designing Dawning 2000, which is NCIC's second generation MPP.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115292049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574049
Shuo Di, Weimin Zheng
With the development of CPUs and communication networks, workstation clusters using message-passing mechanism become a crucial role in the field of network computing. Today's clusters are mainly connected by networks running traditional communication protocols (such as TCP/IP). The high overheads of these protocols make many parallel applications running on clusters inefficient using the potential computation power provided by the workstations and the networks. A method to solve this problem is to construct reduced communication protocol. This paper gives a detailed analysis of overheads produced by traditional protocols and provides some global strategies to design a reduced communication protocol. Our implementation method of such a protocol is described here together with some core algorithms and the testing results.
{"title":"Reduced communication protocol for clusters","authors":"Shuo Di, Weimin Zheng","doi":"10.1109/APDC.1997.574049","DOIUrl":"https://doi.org/10.1109/APDC.1997.574049","url":null,"abstract":"With the development of CPUs and communication networks, workstation clusters using message-passing mechanism become a crucial role in the field of network computing. Today's clusters are mainly connected by networks running traditional communication protocols (such as TCP/IP). The high overheads of these protocols make many parallel applications running on clusters inefficient using the potential computation power provided by the workstations and the networks. A method to solve this problem is to construct reduced communication protocol. This paper gives a detailed analysis of overheads produced by traditional protocols and provides some global strategies to design a reduced communication protocol. Our implementation method of such a protocol is described here together with some core algorithms and the testing results.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132503121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574029
L. Yang
In this paper we study the parallel aspects of PCGLS, a basic iterative method whose main idea is to organize the computation of conjugate gradient method with preconditioner applied to normal equations, and incomplete modified Gram-Schmidt (IMGS) preconditioner for solving sparse least squares problems on massively parallel distributed memory computers. The performance of these methods on this kind of architecture is always limited because of the global communication required for the inner products. We describe the parallelization of PCGLS and IMGS preconditioner by two ways of improvement. One is to assemble the results of a number of inner products collectively and the other is to create situations when communication can be overlapped with computation. A theoretical model of computation and communication phases is presented which allows us to decide the number of processors that minimizes the runtime. Several numerical experiments on Parsytec GC/PowerPlus are presented.
{"title":"Solving sparse least squares problems on massively distributed memory computers","authors":"L. Yang","doi":"10.1109/APDC.1997.574029","DOIUrl":"https://doi.org/10.1109/APDC.1997.574029","url":null,"abstract":"In this paper we study the parallel aspects of PCGLS, a basic iterative method whose main idea is to organize the computation of conjugate gradient method with preconditioner applied to normal equations, and incomplete modified Gram-Schmidt (IMGS) preconditioner for solving sparse least squares problems on massively parallel distributed memory computers. The performance of these methods on this kind of architecture is always limited because of the global communication required for the inner products. We describe the parallelization of PCGLS and IMGS preconditioner by two ways of improvement. One is to assemble the results of a number of inner products collectively and the other is to create situations when communication can be overlapped with computation. A theoretical model of computation and communication phases is presented which allows us to decide the number of processors that minimizes the runtime. Several numerical experiments on Parsytec GC/PowerPlus are presented.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"11651 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114508062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574046
L. He, Yongqiang Sun
Snapshot algorithms are fundamental for many distributed applications and must often be executed repeatedly. We present three snapshot algorithms. The first one is based on the assumption of global time, it computes channel states using several schemes. Taking consistent cut for global time instant, we show that the algorithm is applicable for existing snapshot algorithms. The second one is a real token passing based algorithm for non-FIFO asynchronous distributed systems. Its message complexity of control messages is O(n). The last algorithm is the repeated version of the second one. Using this algorithm, processes can get consistent global states at their convenience concurrently.
{"title":"On distributed snapshot algorithms","authors":"L. He, Yongqiang Sun","doi":"10.1109/APDC.1997.574046","DOIUrl":"https://doi.org/10.1109/APDC.1997.574046","url":null,"abstract":"Snapshot algorithms are fundamental for many distributed applications and must often be executed repeatedly. We present three snapshot algorithms. The first one is based on the assumption of global time, it computes channel states using several schemes. Taking consistent cut for global time instant, we show that the algorithm is applicable for existing snapshot algorithms. The second one is a real token passing based algorithm for non-FIFO asynchronous distributed systems. Its message complexity of control messages is O(n). The last algorithm is the repeated version of the second one. Using this algorithm, processes can get consistent global states at their convenience concurrently.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"20 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114265018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574054
Xinda Lu, Y. Hu, Jing Chen
This paper presents a lifetime-sensitive scheduling method. By shortening lifetimes of variables in scheduling phase, it can lighten register pressure in register allocation phase, lessen spill codes and result in more efficient object codes. The preliminary experimental results show that this method is an effective scheduling method.
{"title":"A lifetime-sensitive scheduling method","authors":"Xinda Lu, Y. Hu, Jing Chen","doi":"10.1109/APDC.1997.574054","DOIUrl":"https://doi.org/10.1109/APDC.1997.574054","url":null,"abstract":"This paper presents a lifetime-sensitive scheduling method. By shortening lifetimes of variables in scheduling phase, it can lighten register pressure in register allocation phase, lessen spill codes and result in more efficient object codes. The preliminary experimental results show that this method is an effective scheduling method.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"174 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114586903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574060
Yongqiang Sun, Kai Lin, Yijia Chen
We describe in this paper a partial evaluator for a parallel programming language. The parallel language we present is a combination of lambda calculus and message passing communication mechanism. By improving some techniques originally used for partial evaluation of sequential language and introducing some new methods, we successfully solve the problems caused by some internal semantic differences between lambda calculus and message passing in our partial evaluator for the parallel language.
{"title":"Automatic generation of parallel compiler-partial evaluation of parallel lambda language","authors":"Yongqiang Sun, Kai Lin, Yijia Chen","doi":"10.1109/APDC.1997.574060","DOIUrl":"https://doi.org/10.1109/APDC.1997.574060","url":null,"abstract":"We describe in this paper a partial evaluator for a parallel programming language. The parallel language we present is a combination of lambda calculus and message passing communication mechanism. By improving some techniques originally used for partial evaluation of sequential language and introducing some new methods, we successfully solve the problems caused by some internal semantic differences between lambda calculus and message passing in our partial evaluator for the parallel language.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114779895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574018
C. Kessler, H. Seidl
We introduce a new parallel programming paradigm, namely synchronous parallel critical sections. Such parallel critical sections must be seen in the context of switching between synchronous and asynchronous modes of computation. Thread farming allows to generate bunches of threads to solve independent subproblems asynchronously and in parallel. Opposed to that, synchronous parallel critical sections allow to organize bunches of asynchronous parallel threads to execute certain task jointly and synchronously. We show how the PRAM language Fork95 can be extended by a construct join supporting parallel critical sections. We explain its semantics and implementation, and discuss possible applications.
{"title":"Language support for synchronous parallel critical sections","authors":"C. Kessler, H. Seidl","doi":"10.1109/APDC.1997.574018","DOIUrl":"https://doi.org/10.1109/APDC.1997.574018","url":null,"abstract":"We introduce a new parallel programming paradigm, namely synchronous parallel critical sections. Such parallel critical sections must be seen in the context of switching between synchronous and asynchronous modes of computation. Thread farming allows to generate bunches of threads to solve independent subproblems asynchronously and in parallel. Opposed to that, synchronous parallel critical sections allow to organize bunches of asynchronous parallel threads to execute certain task jointly and synchronously. We show how the PRAM language Fork95 can be extended by a construct join supporting parallel critical sections. We explain its semantics and implementation, and discuss possible applications.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"199 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116780133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574039
Zhizhong Tang, Chihong Zhang, Sifei Lvand, Tao Yu
A new VLIW architecture, called GPMB (Global Pipelining of Multi-Branch), is discussed in this paper. The GPMB architecture can handle branch-intensive programs efficiently. With the concept of next address function, GPMB regards branching as correctly calculating the next address. The next address function is implemented by hardware and software in GPMB. A brief description of GPMB and a detailed example are included. A comparison with other architectures is also presented in this paper.
本文讨论了一种新的VLIW体系结构GPMB (Global Pipelining of Multi-Branch)。GPMB架构可以有效地处理分支密集型程序。GPMB采用下一地址函数的概念,将分支视为正确计算下一地址。下一地址功能在GPMB中通过硬件和软件实现。本文对GPMB进行了简要描述,并给出了一个详细的示例。并与其他体系结构进行了比较。
{"title":"A new architecture for branch-intensive loops","authors":"Zhizhong Tang, Chihong Zhang, Sifei Lvand, Tao Yu","doi":"10.1109/APDC.1997.574039","DOIUrl":"https://doi.org/10.1109/APDC.1997.574039","url":null,"abstract":"A new VLIW architecture, called GPMB (Global Pipelining of Multi-Branch), is discussed in this paper. The GPMB architecture can handle branch-intensive programs efficiently. With the concept of next address function, GPMB regards branching as correctly calculating the next address. The next address function is implemented by hardware and software in GPMB. A brief description of GPMB and a detailed example are included. A comparison with other architectures is also presented in this paper.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129051712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574021
Zhiwei Xu, K. Hwang
This paper presents the coherent parallel programming concept using a new parallel language called C/spl par/ (pronounced C Parallel). The C/spl par/ language is based on the standard C language with a small set of extended constructs for parallelism and process interaction. At the core of C/spl par/ is a structured construct called coherent region, which facilitates the development of coherent programs, i.e., parallel programs that are structured, determinate, terminative, and compositional. We present the basic features of C/spl par/ and show that coherent region is a versatile construct.
{"title":"Coherent parallel programming in C/spl par/","authors":"Zhiwei Xu, K. Hwang","doi":"10.1109/APDC.1997.574021","DOIUrl":"https://doi.org/10.1109/APDC.1997.574021","url":null,"abstract":"This paper presents the coherent parallel programming concept using a new parallel language called C/spl par/ (pronounced C Parallel). The C/spl par/ language is based on the standard C language with a small set of extended constructs for parallelism and process interaction. At the core of C/spl par/ is a structured construct called coherent region, which facilitates the development of coherent programs, i.e., parallel programs that are structured, determinate, terminative, and compositional. We present the basic features of C/spl par/ and show that coherent region is a versatile construct.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117003701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-03-19DOI: 10.1109/APDC.1997.574009
J. You, H. Shen, H. Cohen
This paper proposes an efficient parallel approach to texture classification for image retrieval. The idea behind this method is to pre-extract texture features in terms of texture energy measurement associated with a 'tuned' mask and store them in a multi-scale and multi-orientation texture class database via a two-dimensional linked list for query. Thus each texture class sample in the database can be traced by its texture energy in a two-dimensional row sorted matrix. The parallel searching strategies are introduced for fast identifying the entities closest to the input texture throughout the given texture energy matrix. In contrast to the traditional search methods, our approach incorporates different computation patterns for different cases of available processor numbers and concerns with robust and work-optimal parallel algorithms for row-search and minimum-find based an the accelerated cascading technique and the dynamic processor allocation scheme. Applications of the proposed parallel search and multisearch algorithms to both single image classification and multiple image classification are discussed. The time complexity analysis shows that our proposal will speed up the classification tasks in a simple but dynamic manner. Examples are presented of the texture classification task applied to image retrieval of Brodatz textures, comprising various orientations and scales.
{"title":"An efficient parallel texture classification for image retrieval","authors":"J. You, H. Shen, H. Cohen","doi":"10.1109/APDC.1997.574009","DOIUrl":"https://doi.org/10.1109/APDC.1997.574009","url":null,"abstract":"This paper proposes an efficient parallel approach to texture classification for image retrieval. The idea behind this method is to pre-extract texture features in terms of texture energy measurement associated with a 'tuned' mask and store them in a multi-scale and multi-orientation texture class database via a two-dimensional linked list for query. Thus each texture class sample in the database can be traced by its texture energy in a two-dimensional row sorted matrix. The parallel searching strategies are introduced for fast identifying the entities closest to the input texture throughout the given texture energy matrix. In contrast to the traditional search methods, our approach incorporates different computation patterns for different cases of available processor numbers and concerns with robust and work-optimal parallel algorithms for row-search and minimum-find based an the accelerated cascading technique and the dynamic processor allocation scheme. Applications of the proposed parallel search and multisearch algorithms to both single image classification and multiple image classification are discussed. The time complexity analysis shows that our proposal will speed up the classification tasks in a simple but dynamic manner. Examples are presented of the texture classification task applied to image retrieval of Brodatz textures, comprising various orientations and scales.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127048985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}