Pub Date : 1999-09-01DOI: 10.1002/(SICI)1096-9128(199909)11:11%3C655::AID-CPE449%3E3.0.CO;2-7
M. Fleury, N. Sarvan, A. Downton, A. Clark
The pipelines of processor farms PPF design pattern intended for continuous ow embedded systems has been augmented by a software toolkit at the system analysis level Other relevant approaches to system support employing tools are reviewed The PPF structure supports incrementally scalable systems which can meet real time speci cations An outline of the design and development cycle of PPF systems follows The paper considers in detail the prediction component of the cycle A graphical simulation tool for modelling asynchronous pipeline behaviour uses a Java based visual display An extended example showing how the performance tool supports PPF design principles concludes the paper
{"title":"Methodology and tools for system analysis of parallel pipelines","authors":"M. Fleury, N. Sarvan, A. Downton, A. Clark","doi":"10.1002/(SICI)1096-9128(199909)11:11%3C655::AID-CPE449%3E3.0.CO;2-7","DOIUrl":"https://doi.org/10.1002/(SICI)1096-9128(199909)11:11%3C655::AID-CPE449%3E3.0.CO;2-7","url":null,"abstract":"The pipelines of processor farms PPF design pattern intended for continuous ow embedded systems has been augmented by a software toolkit at the system analysis level Other relevant approaches to system support employing tools are reviewed The PPF structure supports incrementally scalable systems which can meet real time speci cations An outline of the design and development cycle of PPF systems follows The paper considers in detail the prediction component of the cycle A graphical simulation tool for modelling asynchronous pipeline behaviour uses a Java based visual display An extended example showing how the performance tool supports PPF design principles concludes the paper","PeriodicalId":199059,"journal":{"name":"Concurr. Pract. Exp.","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130067058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-09-01DOI: 10.1002/(SICI)1096-9128(199909)11:11%3C615::AID-CPE447%3E3.0.CO;2-H
George R. Ribeiro-Justo, T. Delaitre, M. Zemerly, S. Winter
Behavioural and performance analysis is a fundamental problem in the development of parallel (and distributed) programs. To address this problem, models and supporting environments are required to enable designers to build and analyse their programs. The model we put forward in this paper combines graphical and textual representations of the program structure and uses discrete-event simulation for performance and behaviour predictions. A graphical environment supports our model, providing, amongst other features, a graphical editor, a simulation engine and a performance and behaviour visualisation tool. A number of case studies using this environment are also provided for illustration and validation of our model. Prediction errors observed in comparisons of real execution and simulation of case studies have accuracy to within 10%.
{"title":"Accurate performance prediction using visual prototypes","authors":"George R. Ribeiro-Justo, T. Delaitre, M. Zemerly, S. Winter","doi":"10.1002/(SICI)1096-9128(199909)11:11%3C615::AID-CPE447%3E3.0.CO;2-H","DOIUrl":"https://doi.org/10.1002/(SICI)1096-9128(199909)11:11%3C615::AID-CPE447%3E3.0.CO;2-H","url":null,"abstract":"Behavioural and performance analysis is a fundamental problem in the development of parallel (and distributed) programs. To address this problem, models and supporting environments are required to enable designers to build and analyse their programs. The model we put forward in this paper combines graphical and textual representations of the program structure and uses discrete-event simulation for performance and behaviour predictions. A graphical environment supports our model, providing, amongst other features, a graphical editor, a simulation engine and a performance and behaviour visualisation tool. A number of case studies using this environment are also provided for illustration and validation of our model. Prediction errors observed in comparisons of real execution and simulation of case studies have accuracy to within 10%.","PeriodicalId":199059,"journal":{"name":"Concurr. Pract. Exp.","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131636986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-08-25DOI: 10.1002/(SICI)1096-9128(19990825)11:10%3C529::AID-CPE439%3E3.0.CO;2-S
Jan-Jan Wu, Marina C. Chen, J. Cowie
In this paper, we give an overview of the results of the CRAFT optimising compiler project (Fortran 90/HPF subset compilers). We start by describing the theoretical framework within which we designed program transformations for the optimization of interand intraprocedural data motion, as well as the optimizations for parallel loops; we then describe the implementation of the CRAFT compilers for Thinking Machines’ CM-2 and CM-5. We report results from experiments on the Connection Machine CM-5, the IBM SP-2 and a network of UltraSparc workstations. The results demonstrate that these optimizations can achieve significant object code performance improvement. Copyright 1999 John Wiley & Sons, Ltd.
{"title":"CRAFT: a framework for F90/HPF compiler optimizations","authors":"Jan-Jan Wu, Marina C. Chen, J. Cowie","doi":"10.1002/(SICI)1096-9128(19990825)11:10%3C529::AID-CPE439%3E3.0.CO;2-S","DOIUrl":"https://doi.org/10.1002/(SICI)1096-9128(19990825)11:10%3C529::AID-CPE439%3E3.0.CO;2-S","url":null,"abstract":"In this paper, we give an overview of the results of the CRAFT optimising compiler project (Fortran 90/HPF subset compilers). We start by describing the theoretical framework within which we designed program transformations for the optimization of interand intraprocedural data motion, as well as the optimizations for parallel loops; we then describe the implementation of the CRAFT compilers for Thinking Machines’ CM-2 and CM-5. We report results from experiments on the Connection Machine CM-5, the IBM SP-2 and a network of UltraSparc workstations. The results demonstrate that these optimizations can achieve significant object code performance improvement. Copyright 1999 John Wiley & Sons, Ltd.","PeriodicalId":199059,"journal":{"name":"Concurr. Pract. Exp.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125208423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-08-10DOI: 10.1002/(SICI)1096-9128(19990810)11:9%3C479::AID-CPE441%3E3.0.CO;2-S
R. Olsson
Reproducing the execution of a concurrent program is important in debugging and testing. It requires that, regardless of the actual order in which processes may execute, the reproduced execution is identical, with respect to the order in which certain activities occur, to a previously recorded execution. This paper presents a solution to the reproducibility problem for programs written in the SR concurrent programming language. Our solution transforms an arbitrary SR program into one for recording an event sequence and one for replaying from an event sequence. SR provides a rich collection of synchronization mechanisms, including rendezvous, asynchronous message passing, remote procedure call, and dynamic process creation. SR language features allow: exible invocation servicing (e.g., use of invo-cation parameters in selecting an invocation to service in message passing or rendezvous); dynamically created processes and resource (module) instances; dynamic communication paths between processes; and dynamic distribution of programs across multiple machines. Because of these features, adaptations of previous solutions to the reproducibility problem for other languages and notations do not work for SR. Our solution handles all the above features. It results in a naturally distributed control algorithm for programs that are distributed. This paper also describes the implementations of our transformation tools.
{"title":"Reproducible execution of SR programs","authors":"R. Olsson","doi":"10.1002/(SICI)1096-9128(19990810)11:9%3C479::AID-CPE441%3E3.0.CO;2-S","DOIUrl":"https://doi.org/10.1002/(SICI)1096-9128(19990810)11:9%3C479::AID-CPE441%3E3.0.CO;2-S","url":null,"abstract":"Reproducing the execution of a concurrent program is important in debugging and testing. It requires that, regardless of the actual order in which processes may execute, the reproduced execution is identical, with respect to the order in which certain activities occur, to a previously recorded execution. This paper presents a solution to the reproducibility problem for programs written in the SR concurrent programming language. Our solution transforms an arbitrary SR program into one for recording an event sequence and one for replaying from an event sequence. SR provides a rich collection of synchronization mechanisms, including rendezvous, asynchronous message passing, remote procedure call, and dynamic process creation. SR language features allow: exible invocation servicing (e.g., use of invo-cation parameters in selecting an invocation to service in message passing or rendezvous); dynamically created processes and resource (module) instances; dynamic communication paths between processes; and dynamic distribution of programs across multiple machines. Because of these features, adaptations of previous solutions to the reproducibility problem for other languages and notations do not work for SR. Our solution handles all the above features. It results in a naturally distributed control algorithm for programs that are distributed. This paper also describes the implementations of our transformation tools.","PeriodicalId":199059,"journal":{"name":"Concurr. Pract. Exp.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125925942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-08-10DOI: 10.1002/(SICI)1096-9128(19990810)11:9%3C461::AID-CPE436%3E3.0.CO;2-2
J. R. García, C. Rodríguez, Daniel González-Morales, F. Almeida
{"title":"Predicting the execution time of message passing models","authors":"J. R. García, C. Rodríguez, Daniel González-Morales, F. Almeida","doi":"10.1002/(SICI)1096-9128(19990810)11:9%3C461::AID-CPE436%3E3.0.CO;2-2","DOIUrl":"https://doi.org/10.1002/(SICI)1096-9128(19990810)11:9%3C461::AID-CPE436%3E3.0.CO;2-2","url":null,"abstract":"","PeriodicalId":199059,"journal":{"name":"Concurr. Pract. Exp.","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128470341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-07-01DOI: 10.1002/(SICI)1096-9128(199907)11:8%3C387::AID-CPE432%3E3.0.CO;2-4
S. Clarke, S. Dandamudi
Networks of workstations (NOWs) can be used for parallel processing by using public domain software like PVM. However, NOW-based parallel processing suffers from node heterogeneity, background load variations, and high-latency, low-bandwidth communication network. Previous studies on load sharing in NOW-based systems have indicated that, for applications using the work-pile model, a simple load sharing scheme in which the master process gives a fixed amount of work to the slave processes performs as well as any other, more complex schemes. In this paper, we propose a new adaptive load sharing scheme and evaluate its performance using a Pentium-based NOW machine. The communication network used in the system consists of the standard 10 Mbps Ethernet and the 100 Mbps fast Ethernet. We use both these networks to study their impact on the performance of our new policy. The results presented here indicate that the new policy is useful for computation-intensive applications. Copyright 1999 John Wiley & Sons, Ltd.
{"title":"Usefulness of adaptive load sharing for parallel processing on networks of workstations","authors":"S. Clarke, S. Dandamudi","doi":"10.1002/(SICI)1096-9128(199907)11:8%3C387::AID-CPE432%3E3.0.CO;2-4","DOIUrl":"https://doi.org/10.1002/(SICI)1096-9128(199907)11:8%3C387::AID-CPE432%3E3.0.CO;2-4","url":null,"abstract":"Networks of workstations (NOWs) can be used for parallel processing by using public domain software like PVM. However, NOW-based parallel processing suffers from node heterogeneity, background load variations, and high-latency, low-bandwidth communication network. Previous studies on load sharing in NOW-based systems have indicated that, for applications using the work-pile model, a simple load sharing scheme in which the master process gives a fixed amount of work to the slave processes performs as well as any other, more complex schemes. In this paper, we propose a new adaptive load sharing scheme and evaluate its performance using a Pentium-based NOW machine. The communication network used in the system consists of the standard 10 Mbps Ethernet and the 100 Mbps fast Ethernet. We use both these networks to study their impact on the performance of our new policy. The results presented here indicate that the new policy is useful for computation-intensive applications. Copyright 1999 John Wiley & Sons, Ltd.","PeriodicalId":199059,"journal":{"name":"Concurr. Pract. Exp.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134628664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}