In this paper we present a coarse-grained parallel multi-level algorithm for the k-way hypergraph partitioning problem. The algorithm significantly improves on our previous work in terms of run time and scalability behaviour by improving processor utilisation, reducing synchronisation overhead and avoiding disk contention. The new algorithm is also generally applicable and no longer requires a particular structure of the input hypergraph to achieve a good partition quality. We present results which show that the algorithm has good scalability properties on very large hypergraphs with /spl Theta/(10/sup 7/) vertices and consistently outperforms the approximate partitions produced by a state-of-the-art parallel graph partitioning tool in terms of partition quality, by up to 27%.
{"title":"A parallel algorithm for multilevel k-way hypergraph partitioning","authors":"Aleksandar Trifunović, W. Knottenbelt","doi":"10.1109/ISPDC.2004.6","DOIUrl":"https://doi.org/10.1109/ISPDC.2004.6","url":null,"abstract":"In this paper we present a coarse-grained parallel multi-level algorithm for the k-way hypergraph partitioning problem. The algorithm significantly improves on our previous work in terms of run time and scalability behaviour by improving processor utilisation, reducing synchronisation overhead and avoiding disk contention. The new algorithm is also generally applicable and no longer requires a particular structure of the input hypergraph to achieve a good partition quality. We present results which show that the algorithm has good scalability properties on very large hypergraphs with /spl Theta/(10/sup 7/) vertices and consistently outperforms the approximate partitions produced by a state-of-the-art parallel graph partitioning tool in terms of partition quality, by up to 27%.","PeriodicalId":62714,"journal":{"name":"骈文研究","volume":"31 1","pages":"114-121"},"PeriodicalIF":0.0,"publicationDate":"2004-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74000450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sameer Shivle, H. Siegel, A. A. Maciejewski, Tarun Banka, Kiran Chindam, S. Dussinger, Andrew Kutruff, Prashanth Penumarthy, Prakash Pichumani, Praveen Satyasekaran, David Sendek, J. Sousa, J. Sridharan, Prasanna Sugavanam, J. Velazco
An ad hoc grid is a heterogeneous computing system composed of mobile devices. The problem studied here is to statically assign resources to the subtasks of an application, which has an execution time constraint, when the resources are oversubscribed. Each subtask has a preferred version, and a secondary version that uses fewer resources. The goal is to assign resources so that the application meets its execution time constraint while minimizing the number of secondary versions used. Five resource allocation heuristics to derive near-optimal solutions to this problem are presented and evaluated.
{"title":"Mapping of subtasks with multiple versions in a heterogeneous ad hoc grid environment","authors":"Sameer Shivle, H. Siegel, A. A. Maciejewski, Tarun Banka, Kiran Chindam, S. Dussinger, Andrew Kutruff, Prashanth Penumarthy, Prakash Pichumani, Praveen Satyasekaran, David Sendek, J. Sousa, J. Sridharan, Prasanna Sugavanam, J. Velazco","doi":"10.1109/ISPDC.2004.34","DOIUrl":"https://doi.org/10.1109/ISPDC.2004.34","url":null,"abstract":"An ad hoc grid is a heterogeneous computing system composed of mobile devices. The problem studied here is to statically assign resources to the subtasks of an application, which has an execution time constraint, when the resources are oversubscribed. Each subtask has a preferred version, and a secondary version that uses fewer resources. The goal is to assign resources so that the application meets its execution time constraint while minimizing the number of secondary versions used. Five resource allocation heuristics to derive near-optimal solutions to this problem are presented and evaluated.","PeriodicalId":62714,"journal":{"name":"骈文研究","volume":"174 1","pages":"380-387"},"PeriodicalIF":0.0,"publicationDate":"2004-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83384773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper concerns task graph scheduling in parallel programs for a parallel architecture based on dynamic SMP processor clusters with data transmissions on the fly. The assumed executive computer architecture consists of a set of NoC modules, each containing a set of processors and memory blocks connected via a local interconnection network. NoC modules are connected via a global interconnection network. An algorithm for scheduling parallel program graphs is presented, which decomposes an initial program graph into sub-graphs, which are then mapped to NoC modules, reducing global communication between modules. Then these subgraphs are structured inside the modules to include reads on the fly and processor switching. Reads on the fly reduce execution time of the program by elimination of read operations in linear program execution time.
{"title":"Program graph scheduling for dynamic SMP clusters with communication on the fly","authors":"L. Masko","doi":"10.1109/ISPDC.2004.41","DOIUrl":"https://doi.org/10.1109/ISPDC.2004.41","url":null,"abstract":"The paper concerns task graph scheduling in parallel programs for a parallel architecture based on dynamic SMP processor clusters with data transmissions on the fly. The assumed executive computer architecture consists of a set of NoC modules, each containing a set of processors and memory blocks connected via a local interconnection network. NoC modules are connected via a global interconnection network. An algorithm for scheduling parallel program graphs is presented, which decomposes an initial program graph into sub-graphs, which are then mapped to NoC modules, reducing global communication between modules. Then these subgraphs are structured inside the modules to include reads on the fly and processor switching. Reads on the fly reduce execution time of the program by elimination of read operations in linear program execution time.","PeriodicalId":62714,"journal":{"name":"骈文研究","volume":"112 1","pages":"149-154"},"PeriodicalIF":0.0,"publicationDate":"2004-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90385374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, a cluster-computing environment is employed as a computational platform. In order to increase the efficiency of the system, a dynamic task scheduling algorithm is proposed, which balances the load among the nodes of the cluster. The technique is dynamic, nonpreemptive, adaptive, and it uses a mixed centralised and decentralised policies. Based on the divide and conquer principle, the algorithm models the cluster as hyper-grids and then balances the load among them. Recursively, the hyper-grids of dimension k are divided into grids of dimensions k - 1, until the dimension is 1. Then, all the nodes of the cluster are almost equally loaded. The optimum dimension of the hyper-grid is chosen in order to achieve the best performance. The simulation results show the effective use of the algorithm. In addition, we determined the critical points (lower bounds) in which the algorithm can to be triggered.
{"title":"Dynamic task scheduling in computing cluster environments","authors":"I. Savvas, Mohand Tahar Kechadi","doi":"10.1109/ISPDC.2004.21","DOIUrl":"https://doi.org/10.1109/ISPDC.2004.21","url":null,"abstract":"In this study, a cluster-computing environment is employed as a computational platform. In order to increase the efficiency of the system, a dynamic task scheduling algorithm is proposed, which balances the load among the nodes of the cluster. The technique is dynamic, nonpreemptive, adaptive, and it uses a mixed centralised and decentralised policies. Based on the divide and conquer principle, the algorithm models the cluster as hyper-grids and then balances the load among them. Recursively, the hyper-grids of dimension k are divided into grids of dimensions k - 1, until the dimension is 1. Then, all the nodes of the cluster are almost equally loaded. The optimum dimension of the hyper-grid is chosen in order to achieve the best performance. The simulation results show the effective use of the algorithm. In addition, we determined the critical points (lower bounds) in which the algorithm can to be triggered.","PeriodicalId":62714,"journal":{"name":"骈文研究","volume":"1 1","pages":"372-379"},"PeriodicalIF":0.0,"publicationDate":"2004-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83606771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we discuss execution methodology for parallel fine grain programs based on the macro data flow paradigm applied to a FDTD program which is an example of fine grain parallel application based on regular computations executed in an irregular domain. Parallel applications are executed in a MIMD system with message passing implemented with RDMA facility based on rotating buffers control infrastructure. It is shown that such execution model for fine grain parallel applications can facilitate control and synchronization of resources involved in computations and communication. Execution based on macro data flow paradigm reduces synchronization overhead which can not be avoided in message passing communication. This is achieved at a cost of processor time spent on monitoring of program macro node states since we use here a traditional von Neuman system with architectural model unsupported for macro data flow execution. To achieve the best speedup, assignment of macro nodes to physical processors is proceeded by static analysis of program code and optimal decisions as regards node definition/allocation have to be taken.
{"title":"Program implementation based on macro data flow paradigm with RDMA communication support","authors":"A. Smyk, M. Tudruj","doi":"10.1109/ISPDC.2004.42","DOIUrl":"https://doi.org/10.1109/ISPDC.2004.42","url":null,"abstract":"In this paper, we discuss execution methodology for parallel fine grain programs based on the macro data flow paradigm applied to a FDTD program which is an example of fine grain parallel application based on regular computations executed in an irregular domain. Parallel applications are executed in a MIMD system with message passing implemented with RDMA facility based on rotating buffers control infrastructure. It is shown that such execution model for fine grain parallel applications can facilitate control and synchronization of resources involved in computations and communication. Execution based on macro data flow paradigm reduces synchronization overhead which can not be avoided in message passing communication. This is achieved at a cost of processor time spent on monitoring of program macro node states since we use here a traditional von Neuman system with architectural model unsupported for macro data flow execution. To achieve the best speedup, assignment of macro nodes to physical processors is proceeded by static analysis of program code and optimal decisions as regards node definition/allocation have to be taken.","PeriodicalId":62714,"journal":{"name":"骈文研究","volume":"18 1","pages":"270-276"},"PeriodicalIF":0.0,"publicationDate":"2004-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83775510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In its current state, service-oriented grid computing focuses on the unification of resources through virtualization, to enable on demand distributed computing within a preconfigured environment. Organizations or inter-organizational communities willing to share their computational resources typically create a centrally planned grid, where dedicated grid administrators manage the nodes and the offered grid services. In this paper, we present the idea of a spontaneously formed, service-oriented ad hoc grid to harness the unused resources of idle networked workstations and high-performance computing nodes on demand. We discuss the requirements of such an ad hoc grid, show how the service-oriented computing paradigm can be used to realize it and present a proof-of-concept implementation based on the Globus Toolkit 3.0. The features of this system are peer-to-peer based node discovery, automatic node property assessment, hot deployment of services into a running system and added inter-service security.
{"title":"Towards a service-oriented ad hoc grid","authors":"Matthew Smith, T. Friese, Bernd Freisleben","doi":"10.1109/ISPDC.2004.56","DOIUrl":"https://doi.org/10.1109/ISPDC.2004.56","url":null,"abstract":"In its current state, service-oriented grid computing focuses on the unification of resources through virtualization, to enable on demand distributed computing within a preconfigured environment. Organizations or inter-organizational communities willing to share their computational resources typically create a centrally planned grid, where dedicated grid administrators manage the nodes and the offered grid services. In this paper, we present the idea of a spontaneously formed, service-oriented ad hoc grid to harness the unused resources of idle networked workstations and high-performance computing nodes on demand. We discuss the requirements of such an ad hoc grid, show how the service-oriented computing paradigm can be used to realize it and present a proof-of-concept implementation based on the Globus Toolkit 3.0. The features of this system are peer-to-peer based node discovery, automatic node property assessment, hot deployment of services into a running system and added inter-service security.","PeriodicalId":62714,"journal":{"name":"骈文研究","volume":"5 1","pages":"201-208"},"PeriodicalIF":0.0,"publicationDate":"2004-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73779426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A methodology for parallel programming in new graphical program design system (GRADE-S) based on global application state monitoring is presented. A proposal is given on how to prepare a parallel application for execution in new GRADE. A notation for skeleton versions of GRADE programs is explained. Then, an example of the travelling salesman problem (TSP) parallel program is given. GUI in the new GRADE corresponding to the TSP program in the new GRADE is shown with the shots of corresponding windows.
{"title":"Parallel program graphical design with program execution control based on global application states","authors":"M. Tudruj, J. Borkowski, D. Kopanski","doi":"10.1109/ISPDC.2004.38","DOIUrl":"https://doi.org/10.1109/ISPDC.2004.38","url":null,"abstract":"A methodology for parallel programming in new graphical program design system (GRADE-S) based on global application state monitoring is presented. A proposal is given on how to prepare a parallel application for execution in new GRADE. A notation for skeleton versions of GRADE programs is explained. Then, an example of the travelling salesman problem (TSP) parallel program is given. GUI in the new GRADE corresponding to the TSP program in the new GRADE is shown with the shots of corresponding windows.","PeriodicalId":62714,"journal":{"name":"骈文研究","volume":"21 1","pages":"240-247"},"PeriodicalIF":0.0,"publicationDate":"2004-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81731828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
There are two main approaches for designing parallel language. The first approach states that parallel computing demands new programming concepts and radical intellectual changes regarding the way we think about programming, as compared to sequential computing. Therefore, the design of such a parallel language must present new constructs and new programming methodologies. The second approach states that there is no need to reinvent the wheel, and serial languages can be extended to support parallelism. The motivation behind this approach is to keep the language as friendly as possible for the programmer who is the main bridge toward wider acceptance of the new language. In this paper we present a qualitative evaluation of two contemporary parallel languages: OpenMP-C and Unified Parallel C (UPC). Both are explicit parallel programming languages based on the ANSI C standard. OpenMP-C was designed for shared-memory architectures and extends the base-language by using compiler directives that annotate the original source-code. On the other hand, UPC was designed for distribute-shared memory architectures and extends the base-language by new parallel constructs. We deconstruct each parallel language into its basic components, show examples, make a detailed analysis, compare them, and finally draw some conclusions.
{"title":"Analytic comparison of two advanced C language-based parallel programming models","authors":"A. Marowka","doi":"10.1109/ISPDC.2004.11","DOIUrl":"https://doi.org/10.1109/ISPDC.2004.11","url":null,"abstract":"There are two main approaches for designing parallel language. The first approach states that parallel computing demands new programming concepts and radical intellectual changes regarding the way we think about programming, as compared to sequential computing. Therefore, the design of such a parallel language must present new constructs and new programming methodologies. The second approach states that there is no need to reinvent the wheel, and serial languages can be extended to support parallelism. The motivation behind this approach is to keep the language as friendly as possible for the programmer who is the main bridge toward wider acceptance of the new language. In this paper we present a qualitative evaluation of two contemporary parallel languages: OpenMP-C and Unified Parallel C (UPC). Both are explicit parallel programming languages based on the ANSI C standard. OpenMP-C was designed for shared-memory architectures and extends the base-language by using compiler directives that annotate the original source-code. On the other hand, UPC was designed for distribute-shared memory architectures and extends the base-language by new parallel constructs. We deconstruct each parallel language into its basic components, show examples, make a detailed analysis, compare them, and finally draw some conclusions.","PeriodicalId":62714,"journal":{"name":"骈文研究","volume":"170 ","pages":"284-291"},"PeriodicalIF":0.0,"publicationDate":"2004-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91519634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Krzysztof Chmiel, Dominik Tomiak, M. Gawinecki, P. Kaczmarek, M. Szymczak, M. Paprzycki
Agent oriented programming is often described as the next breakthrough in development and implementation of large-scale complex software system. At the same time it is rather difficult to find successful applications of agent technology, in particular precisely when large-scale systems are considered. The aim of this paper is to investigate if one of the possible limits may be the scalability of existing agent technology. We have picked JADE agent platform as technology of choice and investigated its efficiency in a number of test cases. Results of our experiments are presented and discussed.
{"title":"Testing the efficiency of JADE agent platform","authors":"Krzysztof Chmiel, Dominik Tomiak, M. Gawinecki, P. Kaczmarek, M. Szymczak, M. Paprzycki","doi":"10.1109/ISPDC.2004.49","DOIUrl":"https://doi.org/10.1109/ISPDC.2004.49","url":null,"abstract":"Agent oriented programming is often described as the next breakthrough in development and implementation of large-scale complex software system. At the same time it is rather difficult to find successful applications of agent technology, in particular precisely when large-scale systems are considered. The aim of this paper is to investigate if one of the possible limits may be the scalability of existing agent technology. We have picked JADE agent platform as technology of choice and investigated its efficiency in a number of test cases. Results of our experiments are presented and discussed.","PeriodicalId":62714,"journal":{"name":"骈文研究","volume":"1 1","pages":"49-56"},"PeriodicalIF":0.0,"publicationDate":"2004-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88692455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Consistent global state (CGS) monitoring is performed usually with the help of a central monitor. The monitor must receive process local state reports and handle them. In an on-line monitoring environment it can become easily overloaded. We consider CGS detection in a hierarchical way. Application processes are split into groups. Lower-level monitors communicate with an assigned process group and report partial results to the top-level monitor. The top-level monitor combines received data to form CGS. A few variants of hierarchical algorithms for Strongly CGS detection are devised, each variant uses different local clock synchronization pattern. The analysis shows that hierarchical CGS algorithms efficiently distribute network and computational load caused by CGS monitoring without introducing significant additional overhead. The analysis is confirmed by preliminary test results.
{"title":"Hierarchical detection of strongly consistent global states","authors":"J. Borkowski","doi":"10.1109/ISPDC.2004.30","DOIUrl":"https://doi.org/10.1109/ISPDC.2004.30","url":null,"abstract":"Consistent global state (CGS) monitoring is performed usually with the help of a central monitor. The monitor must receive process local state reports and handle them. In an on-line monitoring environment it can become easily overloaded. We consider CGS detection in a hierarchical way. Application processes are split into groups. Lower-level monitors communicate with an assigned process group and report partial results to the top-level monitor. The top-level monitor combines received data to form CGS. A few variants of hierarchical algorithms for Strongly CGS detection are devised, each variant uses different local clock synchronization pattern. The analysis shows that hierarchical CGS algorithms efficiently distribute network and computational load caused by CGS monitoring without introducing significant additional overhead. The analysis is confirmed by preliminary test results.","PeriodicalId":62714,"journal":{"name":"骈文研究","volume":"32 1","pages":"256-261"},"PeriodicalIF":0.0,"publicationDate":"2004-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77762621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}