Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823388
Jong-Kook Kim, D. Hensgen, T. Kidd, H. Siegel, David St. John, C. Irvine, T. Levin, N. W. Porter, V. Prasanna, R. F. Freund
In a distributed heterogeneous computing environment, users' tasks are allocated resources to simultaneously satisfy, to varying degrees, the tasks' different, and possibly conflicting, quality of service (QoS) requirements. When the total demand placed on system resources by the tasks, for a given interval of time, exceeds the resources available, some tasks will receive degraded service or no service at all. One part of a measure to quantify the success of a resource management system (RMS) in such a distributed environment is the collective value of the tasks completed during an interval of time, as perceived by the user, application, or policy maker. The flexible integrated system capability (FISC) ratio introduced here is a measure for quantifying this collective value. The FISC ratio is a multi-dimensional measure, and may include priorities, versions of a task or data, deadlines, situational mode, security, application- and domain-specific QoS, and dependencies. In addition to being used for evaluating and comparing RMS, the FISC ratio can be incorporated as part of the objective function in a system's scheduling heuristics.
{"title":"A QoS performance measure framework for distributed heterogeneous networks","authors":"Jong-Kook Kim, D. Hensgen, T. Kidd, H. Siegel, David St. John, C. Irvine, T. Levin, N. W. Porter, V. Prasanna, R. F. Freund","doi":"10.1109/EMPDP.2000.823388","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823388","url":null,"abstract":"In a distributed heterogeneous computing environment, users' tasks are allocated resources to simultaneously satisfy, to varying degrees, the tasks' different, and possibly conflicting, quality of service (QoS) requirements. When the total demand placed on system resources by the tasks, for a given interval of time, exceeds the resources available, some tasks will receive degraded service or no service at all. One part of a measure to quantify the success of a resource management system (RMS) in such a distributed environment is the collective value of the tasks completed during an interval of time, as perceived by the user, application, or policy maker. The flexible integrated system capability (FISC) ratio introduced here is a measure for quantifying this collective value. The FISC ratio is a multi-dimensional measure, and may include priorities, versions of a task or data, deadlines, situational mode, security, application- and domain-specific QoS, and dependencies. In addition to being used for evaluating and comparing RMS, the FISC ratio can be incorporated as part of the objective function in a system's scheduling heuristics.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128211955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823437
N. Koziris, M. Romesis, P. Tsanakas, G. Papakonstantinou
The most important issue in sequential program parallelisation is the efficient assignment of computations into different processing elements. In the past, too many approaches were devoted in efficient program parallelization considering various models for the parallel programs and the target architectures. The most widely used parallelism description model is the task graph model with precedence constraints. Nevertheless, as far as physical mapping of tasks onto parallel architectures is concerned little research has given practical results. It is well known that the physical mapping problem is NP-hard in the strong sense, thus allowing only for heuristic approaches. Most researchers or tool programmers use exhaustive algorithms, or the classical method of simulated annealing. This paper presents an alternative approach onto the mapping problem. Given the graph of clustered tasks, and the graph of the target distributed architecture, our heuristic finds a mapping by first placing the highly communicative tasks on adjacent nodes of the processor network. Once these "backbone" tasks are mapped there is no backtracking, thus achieving low complexity. Therefore, the remaining tasks are placed beginning from those close to the "backbone" tasks. The paper concludes with performance and comparison results which reveal the method's efficiency.
{"title":"An efficient algorithm for the physical mapping of clustered task graphs onto multiprocessor architectures","authors":"N. Koziris, M. Romesis, P. Tsanakas, G. Papakonstantinou","doi":"10.1109/EMPDP.2000.823437","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823437","url":null,"abstract":"The most important issue in sequential program parallelisation is the efficient assignment of computations into different processing elements. In the past, too many approaches were devoted in efficient program parallelization considering various models for the parallel programs and the target architectures. The most widely used parallelism description model is the task graph model with precedence constraints. Nevertheless, as far as physical mapping of tasks onto parallel architectures is concerned little research has given practical results. It is well known that the physical mapping problem is NP-hard in the strong sense, thus allowing only for heuristic approaches. Most researchers or tool programmers use exhaustive algorithms, or the classical method of simulated annealing. This paper presents an alternative approach onto the mapping problem. Given the graph of clustered tasks, and the graph of the target distributed architecture, our heuristic finds a mapping by first placing the highly communicative tasks on adjacent nodes of the processor network. Once these \"backbone\" tasks are mapped there is no backtracking, thus achieving low complexity. Therefore, the remaining tasks are placed beginning from those close to the \"backbone\" tasks. The paper concludes with performance and comparison results which reveal the method's efficiency.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128212749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823406
R. Moore, B. Klauer, K. Waldschmidt
This paper analyzes the consequences of existing network structure for the design of a protocol for a radical COMA (Cache Only Memory Architecture). Parallel computing today faces two significant challenges: the difficulty of programming and the need to leverage existing "off-the-shelf" hardware. The difficulty of programming parallel computers can be split into two problems: distributing the data, and distributing the computation. Parallelizing compilers address both problems, but have limited application outside the domain of loop intensive "scientific" code. Conventional COMAs provide an adaptive, self-distributing solution to data distribution, but do not address computation distribution. Our proposal leverages parallelizing compilers, and then extends COMA to provide adaptive self-distribution of both data and computation. The radical COMA protocols can be implemented in hardware, software, or a combination of both. When, however, the implementation is constrained to operate in a cluster computing environment (that is, to use only existing, already installed hardware), the protocols have to be reengineered to accommodate the deficiencies of the hardware. This paper identifies the critical quantities of various existing network structures, and discusses their repercussions for protocol design. A new protocol is presented in detail.
{"title":"Tailoring a self-distributing architecture to a cluster computer environment","authors":"R. Moore, B. Klauer, K. Waldschmidt","doi":"10.1109/EMPDP.2000.823406","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823406","url":null,"abstract":"This paper analyzes the consequences of existing network structure for the design of a protocol for a radical COMA (Cache Only Memory Architecture). Parallel computing today faces two significant challenges: the difficulty of programming and the need to leverage existing \"off-the-shelf\" hardware. The difficulty of programming parallel computers can be split into two problems: distributing the data, and distributing the computation. Parallelizing compilers address both problems, but have limited application outside the domain of loop intensive \"scientific\" code. Conventional COMAs provide an adaptive, self-distributing solution to data distribution, but do not address computation distribution. Our proposal leverages parallelizing compilers, and then extends COMA to provide adaptive self-distribution of both data and computation. The radical COMA protocols can be implemented in hardware, software, or a combination of both. When, however, the implementation is constrained to operate in a cluster computing environment (that is, to use only existing, already installed hardware), the protocols have to be reengineered to accommodate the deficiencies of the hardware. This paper identifies the critical quantities of various existing network structures, and discusses their repercussions for protocol design. A new protocol is presented in detail.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123747535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823416
C. Roig, A. Ripoll, M. A. Senar, F. Guirado, E. Luque
An efficient mapping of a parallel program in the processors is vital for achieving a high performance on a parallel computer. When the structure of the parallel program in terms of its task execution times, task dependencies, and amount communication data, is known a priori, mapping can be accomplished statically at compile time. Mapping algorithms start from a parallel application model and map automatically tasks to processors in order to minimise the execution time of the program. In this paper we discuss the current models used in mapping parallel programs: Task Precedence Graph (TPG), Task Interaction Graph (TIG) and we define a new model called Temporal Task Interaction Graph (TTIG). The contribution of the TTIG is that it enhances these two previous models with the ability to explicitly capture the potential degree of parallel execution between adjacent tasks allowing the development of efficient mapping algorithms. Experimentation had been performed in order to show the effectiveness of TTIG model for a set of graphs. The results are compared with the optimal assignment and the obtained using TIG model and they confirm that using the TTIG model, better assignments can be obtained.
{"title":"Modelling message-passing programs for static mapping","authors":"C. Roig, A. Ripoll, M. A. Senar, F. Guirado, E. Luque","doi":"10.1109/EMPDP.2000.823416","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823416","url":null,"abstract":"An efficient mapping of a parallel program in the processors is vital for achieving a high performance on a parallel computer. When the structure of the parallel program in terms of its task execution times, task dependencies, and amount communication data, is known a priori, mapping can be accomplished statically at compile time. Mapping algorithms start from a parallel application model and map automatically tasks to processors in order to minimise the execution time of the program. In this paper we discuss the current models used in mapping parallel programs: Task Precedence Graph (TPG), Task Interaction Graph (TIG) and we define a new model called Temporal Task Interaction Graph (TTIG). The contribution of the TTIG is that it enhances these two previous models with the ability to explicitly capture the potential degree of parallel execution between adjacent tasks allowing the development of efficient mapping algorithms. Experimentation had been performed in order to show the effectiveness of TTIG model for a set of graphs. The results are compared with the optimal assignment and the obtained using TIG model and they confirm that using the TTIG model, better assignments can be obtained.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132097948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823426
A. Rowstron
In this paper we describe how we use mobile objects to provide distributed programs coordinating through a persistent distributed shared memory (DSM) with tolerance to sudden agent failure, and use the increasingly popular Linda-like tuple space languages as an example for implementation of the concept. In programs coordinating and communicating through a DSM a data structure is shared between multiple agents, and the agents update the shared structure directly. However, if an agent should suddenly fail it is often hard for the agents to make the data structures consistent with the new application state. For example consider if a data structure contains a list of active agents. In such a case, transactions can be used when adding and removing agent names from the list ensuring that that the data structure is consistent and does not become corrupted should an agent fail. However If failure of the agent occurs after the name has been added, how does the application ensure the list is correct? We argue that using mobile objects we can provide wills for the agents to effectively enable them to ensure the shared data structure is application consistent, even once they have Sailed We show how we have integrated the use of agent wills into a Linda system and show that we have not increased the complexity, of program writing. The integration is simple and general, does not alter the underlying semantics of the operations performed in the will and the use of mobility is transparent to the programmer.
{"title":"Using agent wills to provide fault-tolerance in distributed shared memory systems","authors":"A. Rowstron","doi":"10.1109/EMPDP.2000.823426","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823426","url":null,"abstract":"In this paper we describe how we use mobile objects to provide distributed programs coordinating through a persistent distributed shared memory (DSM) with tolerance to sudden agent failure, and use the increasingly popular Linda-like tuple space languages as an example for implementation of the concept. In programs coordinating and communicating through a DSM a data structure is shared between multiple agents, and the agents update the shared structure directly. However, if an agent should suddenly fail it is often hard for the agents to make the data structures consistent with the new application state. For example consider if a data structure contains a list of active agents. In such a case, transactions can be used when adding and removing agent names from the list ensuring that that the data structure is consistent and does not become corrupted should an agent fail. However If failure of the agent occurs after the name has been added, how does the application ensure the list is correct? We argue that using mobile objects we can provide wills for the agents to effectively enable them to ensure the shared data structure is application consistent, even once they have Sailed We show how we have integrated the use of agent wills into a Linda system and show that we have not increased the complexity, of program writing. The integration is simple and general, does not alter the underlying semantics of the operations performed in the will and the use of mobility is transparent to the programmer.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134082106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823386
Kurt Stockinger, E. Schikuta
We present ViMPIOS, a novel MPI-IO implementation based on ViPIOS, the Vienna Parallel Input Output System. ViMPIOS inherits the defining characteristics of ViPIOS, which makes it a client-server based system focusing on cluster architectures. ViMPIOS stands out from all other MPI-IO implementations by its "truly" portable design, which allows not only applications to be transferred between parallel architectures easily but also to keep their original performance characteristics on the new platform as far as possible. This is kept by the "smart" AI-blackboard module of ViPIOS, which is responsible for an appropriate data layout. Specifically in this paper we concentrate on the algorithm, which maps MPI-IO data structures on respective ViPIOS structures, and thus allows to exploit the ViPIOS properties.
{"title":"ViMPIOS, a \"truly\" portable MPI-IO implementation","authors":"Kurt Stockinger, E. Schikuta","doi":"10.1109/EMPDP.2000.823386","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823386","url":null,"abstract":"We present ViMPIOS, a novel MPI-IO implementation based on ViPIOS, the Vienna Parallel Input Output System. ViMPIOS inherits the defining characteristics of ViPIOS, which makes it a client-server based system focusing on cluster architectures. ViMPIOS stands out from all other MPI-IO implementations by its \"truly\" portable design, which allows not only applications to be transferred between parallel architectures easily but also to keep their original performance characteristics on the new platform as far as possible. This is kept by the \"smart\" AI-blackboard module of ViPIOS, which is responsible for an appropriate data layout. Specifically in this paper we concentrate on the algorithm, which maps MPI-IO data structures on respective ViPIOS structures, and thus allows to exploit the ViPIOS properties.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117197697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823415
J. Brzeziński, D. Wawrzyniak
As is well known Lamport's Bakery algorithm for mutual exclusion of n processes is correct if a physically shared memory is used as the communication facility between processes. An application of weaker consistency models (e.g. causal, processor, PRAM), available in replicated distributed shared memory (DSM) systems appealing due to possible performance improvement may imply incorrectness of the algorithm. It raises consistency requirement problem, a problem of finding weaker consistency models of DSM that is sufficient for the algorithm correctness. In this paper, consistency requirements of distributed shared memory for Lamport's Bakery algorithm for mutual exclusion of n processes are considered It is proven that the algorithm is correct with a consistency model resulting from a combination of sequential consistency and one of the weakest consistency models, PRAM, without explicit synchronisation. The combination is achieved by specifying the consistency model with write operations on shared locations.
{"title":"Consistency requirements of distributed shared memory for Lamport's bakery algorithm for mutual exclusion","authors":"J. Brzeziński, D. Wawrzyniak","doi":"10.1109/EMPDP.2000.823415","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823415","url":null,"abstract":"As is well known Lamport's Bakery algorithm for mutual exclusion of n processes is correct if a physically shared memory is used as the communication facility between processes. An application of weaker consistency models (e.g. causal, processor, PRAM), available in replicated distributed shared memory (DSM) systems appealing due to possible performance improvement may imply incorrectness of the algorithm. It raises consistency requirement problem, a problem of finding weaker consistency models of DSM that is sufficient for the algorithm correctness. In this paper, consistency requirements of distributed shared memory for Lamport's Bakery algorithm for mutual exclusion of n processes are considered It is proven that the algorithm is correct with a consistency model resulting from a combination of sequential consistency and one of the weakest consistency models, PRAM, without explicit synchronisation. The combination is achieved by specifying the consistency model with write operations on shared locations.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121345169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823396
M. Masoodian, S. Luz
Magic Lounge is a shared virtual meeting environment which has been designed to support meetings between physically remote people who would like to interact with each other using any one of a number of heterogeneous communication devices. This paper describes the heterogeneous client-server architecture of the Magic Lounge which supports communication between PCs, PDAs, palmtops, and mobile telephones. This architecture combines a number of different technologies, including CORBA and MBone, to provide the necessary means of audio and textual communication between the users of different devices. This paper also discusses the various requirements of this type of meeting environment, as well as describing some of the Magic Lounge software tools and components which have been developed to provide intelligent communication services to its users.
{"title":"Heterogeneous client-server architecture for a virtual meeting environment","authors":"M. Masoodian, S. Luz","doi":"10.1109/EMPDP.2000.823396","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823396","url":null,"abstract":"Magic Lounge is a shared virtual meeting environment which has been designed to support meetings between physically remote people who would like to interact with each other using any one of a number of heterogeneous communication devices. This paper describes the heterogeneous client-server architecture of the Magic Lounge which supports communication between PCs, PDAs, palmtops, and mobile telephones. This architecture combines a number of different technologies, including CORBA and MBone, to provide the necessary means of audio and textual communication between the users of different devices. This paper also discusses the various requirements of this type of meeting environment, as well as describing some of the Magic Lounge software tools and components which have been developed to provide intelligent communication services to its users.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122633110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823393
R. Montero, M. Prieto, I. Llorente, F. Tirado
In this paper two well-known robust multigrid solvers for anisotropic operators on structured grids are compared: alternating-plane smoothers with full coarsening and plane smoothers combined with semicoarsening. The study takes into account not only numerical properties but also architectural ones, focusing on cache memory exploitation and parallel characteristics. Experimental results for the sequential algorithms have been obtained on two different systems based on the MIPS R10000 processor but with different L2 cache sizes (an SGI O2 workstation and an SGI Origin 2000 system). Two different parallel implementations for the latter robust approach have been considered. The first one has optimal parallel characteristics but due to deterioration of the convergence properties its realistic efficiency is not satisfactory. In the second one, some processors remain idle during a short period of time on every multigrid cycle, however the algorithm is more efficient since it preserves the numerical properties of the sequential version. Parallel experiments have also been taken on a Cray T3E system.
{"title":"A robust multigrid solver on parallel computers","authors":"R. Montero, M. Prieto, I. Llorente, F. Tirado","doi":"10.1109/EMPDP.2000.823393","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823393","url":null,"abstract":"In this paper two well-known robust multigrid solvers for anisotropic operators on structured grids are compared: alternating-plane smoothers with full coarsening and plane smoothers combined with semicoarsening. The study takes into account not only numerical properties but also architectural ones, focusing on cache memory exploitation and parallel characteristics. Experimental results for the sequential algorithms have been obtained on two different systems based on the MIPS R10000 processor but with different L2 cache sizes (an SGI O2 workstation and an SGI Origin 2000 system). Two different parallel implementations for the latter robust approach have been considered. The first one has optimal parallel characteristics but due to deterioration of the convergence properties its realistic efficiency is not satisfactory. In the second one, some processors remain idle during a short period of time on every multigrid cycle, however the algorithm is more efficient since it preserves the numerical properties of the sequential version. Parallel experiments have also been taken on a Cray T3E system.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129405380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823402
A. Zavanella, Alessandro Milazzo
The BSP cost model provides a general framework to design efficient and portable data-parallel algorithms. Execution costs of BSP programs are predicted combining a limited number of program and machine dependent parameters. BSP programs can be written using several programming tools. In this work we explore the predictability of bulk synchronous programs implemented with the Message Passing Interface. Two classic computational geometry problems: the convex hull (CH) and the lower envelope (LE) are considered as cases of study. Efficient BSP algorithms have been implemented using MPI and executed on three different parallel architectures: a Fujitsu AP1000 (distributed memory), a CRAY T3E (distributed shared memory) and a cluster of PCs (Backus). The paper compares the degree of predictability on these architectures, analysing the main sources of error.
{"title":"Predictability of bulk synchronous programs using MPI","authors":"A. Zavanella, Alessandro Milazzo","doi":"10.1109/EMPDP.2000.823402","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823402","url":null,"abstract":"The BSP cost model provides a general framework to design efficient and portable data-parallel algorithms. Execution costs of BSP programs are predicted combining a limited number of program and machine dependent parameters. BSP programs can be written using several programming tools. In this work we explore the predictability of bulk synchronous programs implemented with the Message Passing Interface. Two classic computational geometry problems: the convex hull (CH) and the lower envelope (LE) are considered as cases of study. Efficient BSP algorithms have been implemented using MPI and executed on three different parallel architectures: a Fujitsu AP1000 (distributed memory), a CRAY T3E (distributed shared memory) and a cluster of PCs (Backus). The paper compares the degree of predictability on these architectures, analysing the main sources of error.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121558055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}