Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823430
M. Feil, A. Uhl
In this work we describe and analyze algorithms for 2-D wavelet packet decomposition for MIMD distributed memory architectures. The main goal is the generalization of former parallel WP algorithms which are constrained to a number of processor elements equal to a power of 4. We discuss several optimizations and generalizations of data parallel message passing algorithms and finally compare the results obtained on a Cray T3D.
{"title":"2-D wavelet packet decomposition on multicomputers","authors":"M. Feil, A. Uhl","doi":"10.1109/EMPDP.2000.823430","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823430","url":null,"abstract":"In this work we describe and analyze algorithms for 2-D wavelet packet decomposition for MIMD distributed memory architectures. The main goal is the generalization of former parallel WP algorithms which are constrained to a number of processor elements equal to a power of 4. We discuss several optimizations and generalizations of data parallel message passing algorithms and finally compare the results obtained on a Cray T3D.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116731240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823387
R. Hofmann, F. Lemmen
Specification-driven monitoring is a novel technique for systematically analyzing the functional and temporal behavior of a system starting from specification to the implementation with the help of monitoring. This paper briefly shows the method and the tools belonging to it. The main part comprises a measurement study of a TCP/IP protocol stack fully specified in SDL. This study shows, how the TCP protocol stack was analyzed and improved in terms of correctness and performance. After correcting a difficult error in the runtime system, the throughput of the system improved by a factor of 10.
{"title":"Specification-driven monitoring of TCP/IP","authors":"R. Hofmann, F. Lemmen","doi":"10.1109/EMPDP.2000.823387","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823387","url":null,"abstract":"Specification-driven monitoring is a novel technique for systematically analyzing the functional and temporal behavior of a system starting from specification to the implementation with the help of monitoring. This paper briefly shows the method and the tools belonging to it. The main part comprises a measurement study of a TCP/IP protocol stack fully specified in SDL. This study shows, how the TCP protocol stack was analyzed and improved in terms of correctness and performance. After correcting a difficult error in the runtime system, the throughput of the system improved by a factor of 10.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128634956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823411
Yifeng Chen
Extends bulk-synchronous programming (BSP) to incorporate reactive (i.e. non-terminating) programming. We propose a semantic model for BSP which allows a process to have infinitely many supersteps. The semantics reveals the essential difference between BSP and sequential specifications. Based on the model, a specification language, called the Super-Step Specification (SSS) language, is proposed to support modularised programming and to hide communication and synchronisation details in specifications. The notion of a public variable is proposed in order to substantially simplify reactive programming. The normal forms of BSP and SSS are identified, and complete sets of laws for the two languages are given. Finally, a few refinement laws are used to provide a BSP treatment of the dining philosophers problem, which illustrates the power of BSP reactive programming. Much of the formalism presented in this paper can also be applied to non-reactive programming.
{"title":"Specification for reactive bulk-synchronous programming","authors":"Yifeng Chen","doi":"10.1109/EMPDP.2000.823411","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823411","url":null,"abstract":"Extends bulk-synchronous programming (BSP) to incorporate reactive (i.e. non-terminating) programming. We propose a semantic model for BSP which allows a process to have infinitely many supersteps. The semantics reveals the essential difference between BSP and sequential specifications. Based on the model, a specification language, called the Super-Step Specification (SSS) language, is proposed to support modularised programming and to hide communication and synchronisation details in specifications. The notion of a public variable is proposed in order to substantially simplify reactive programming. The normal forms of BSP and SSS are identified, and complete sets of laws for the two languages are given. Finally, a few refinement laws are used to provide a BSP treatment of the dining philosophers problem, which illustrates the power of BSP reactive programming. Much of the formalism presented in this paper can also be applied to non-reactive programming.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124029365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823433
A. Cláudio, J. D. Cunha, M. B. Carmo
MPVisualizer (Message Passing Visualizer) is a tool for the monitoring and debugging of message passing parallel applications with three components: the trace/replay mechanism, the graphical user interface and a central component, called visualization engine. The engine, which plays the main role during the replay phase, builds an object-oriented model of the application. Taking full advantage of inheritance and polymorphism the tool can be adapted to different message passing environments and different graphical environments, and easily reprogrammed to detect specific predicates. The engine is also prepared to recognize race conditions.
{"title":"Monitoring and debugging message passing applications with MPVisualizer","authors":"A. Cláudio, J. D. Cunha, M. B. Carmo","doi":"10.1109/EMPDP.2000.823433","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823433","url":null,"abstract":"MPVisualizer (Message Passing Visualizer) is a tool for the monitoring and debugging of message passing parallel applications with three components: the trace/replay mechanism, the graphical user interface and a central component, called visualization engine. The engine, which plays the main role during the replay phase, builds an object-oriented model of the application. Taking full advantage of inheritance and polymorphism the tool can be adapted to different message passing environments and different graphical environments, and easily reprogrammed to detect specific predicates. The engine is also prepared to recognize race conditions.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"740 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127572102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823401
A. Goscinski, M. Hobbs, J. Silcock
The primary paradigm for building parallel applications for execution on clusters of workstations (COWs) can be generalised into message passing (MP) and distributed shared memory (DSM). Unfortunately the currently available run-time environments and operating systems do not provide satisfactory levels of transparency, management support, and only support either MP or DSM. We propose a unique and novel approach where the MP and DSM services are provided to the application programmer as a cohesive and comprehensive set of parallel processing servers that are integral components of an operating system. The performance of a number of common parallel applications, employing both MP (raw and PVM based) and DSM, demonstrate the high quality of the proposed approach.
{"title":"Performance and transparency of message passing and DSM services within the GENESIS operating system for managing parallelism on COWs","authors":"A. Goscinski, M. Hobbs, J. Silcock","doi":"10.1109/EMPDP.2000.823401","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823401","url":null,"abstract":"The primary paradigm for building parallel applications for execution on clusters of workstations (COWs) can be generalised into message passing (MP) and distributed shared memory (DSM). Unfortunately the currently available run-time environments and operating systems do not provide satisfactory levels of transparency, management support, and only support either MP or DSM. We propose a unique and novel approach where the MP and DSM services are provided to the application programmer as a cohesive and comprehensive set of parallel processing servers that are integral components of an operating system. The performance of a number of common parallel applications, employing both MP (raw and PVM based) and DSM, demonstrate the high quality of the proposed approach.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124067411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823408
R. Baraglia, R. Ferrini, D. Laforenza, P. Palmerini, R. Perego
Heterogeneous computing is a special form of parallel and distributed computing where computations are performed using a single autonomous computer operating in both SIMD and MIMD modes, or using a number of connected autonomous computers. In multimode system heterogeneous computing, tasks can be executed in both SIMD and MIMD simultaneously. In this paper, we present PQE HPF, a High Performance Fortran (HPF) based programming library which allows one to exploit the MIMD and SIMD capabilities offered by PQE-1, a multimode parallel architecture. Two different implementations of a well-known application, using HPF and PQE HPF respectively, were used to evaluate the overheads introduced over the machine's runtime system. Preliminary tests, conducted by running the case study application on the first PQE-1 prototype, show good results and encourage us to dedicate more effort to implement real production parallel codes on a similar architecture.
{"title":"PQE HPF-a library for exploiting the capabilities of a PQE-1 heterogeneous parallel architecture","authors":"R. Baraglia, R. Ferrini, D. Laforenza, P. Palmerini, R. Perego","doi":"10.1109/EMPDP.2000.823408","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823408","url":null,"abstract":"Heterogeneous computing is a special form of parallel and distributed computing where computations are performed using a single autonomous computer operating in both SIMD and MIMD modes, or using a number of connected autonomous computers. In multimode system heterogeneous computing, tasks can be executed in both SIMD and MIMD simultaneously. In this paper, we present PQE HPF, a High Performance Fortran (HPF) based programming library which allows one to exploit the MIMD and SIMD capabilities offered by PQE-1, a multimode parallel architecture. Two different implementations of a well-known application, using HPF and PQE HPF respectively, were used to evaluate the overheads introduced over the machine's runtime system. Preliminary tests, conducted by running the case study application on the first PQE-1 prototype, show good results and encourage us to dedicate more effort to implement real production parallel codes on a similar architecture.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127989233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823412
T. Fink, S. Kindermann
The metacomputing system Amica is a new approach to support the development of coarse-grained applications for distributed dynamic heterogeneous systems (e.g. computers linked to the Internet). It aims at the location-transparent and convenient design of distributed applications and at the easy integration of legacy systems. Applications are described in the form of application graphs based on a predefined set of reusable components and connectors. This graph is dynamically interpreted using the Amica infrastructure. Amica provides uniform access to computational resources using the well-known factory pattern. Additionally, a memory subsystem supports the location-transparent use of complex data objects which may be replicated to increase access speed. To transfer data, specific network resources can be used. We report initial experiences with using Amica for a computationally intensive real-world problem: the parallel simulation of cellular mobile systems. Measurements show that Amica, even in its premature stage, provides a convenient interface and sufficient efficiency to build distributed applications utilizing heterogeneous dynamic resources.
{"title":"First steps in metacomputing with Amica","authors":"T. Fink, S. Kindermann","doi":"10.1109/EMPDP.2000.823412","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823412","url":null,"abstract":"The metacomputing system Amica is a new approach to support the development of coarse-grained applications for distributed dynamic heterogeneous systems (e.g. computers linked to the Internet). It aims at the location-transparent and convenient design of distributed applications and at the easy integration of legacy systems. Applications are described in the form of application graphs based on a predefined set of reusable components and connectors. This graph is dynamically interpreted using the Amica infrastructure. Amica provides uniform access to computational resources using the well-known factory pattern. Additionally, a memory subsystem supports the location-transparent use of complex data objects which may be replicated to increase access speed. To transfer data, specific network resources can be used. We report initial experiences with using Amica for a computationally intensive real-world problem: the parallel simulation of cellular mobile systems. Measurements show that Amica, even in its premature stage, provides a convenient interface and sufficient efficiency to build distributed applications utilizing heterogeneous dynamic resources.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121209615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823418
J. González, C. León, F. Piccoli, A. M. Printista, J. R. García, C. Rodríguez, F. D. Sande
An extension to the Bulk Synchronous Parallel Model (BSP) to allow the use of asynchronous BSP groups of processors is presented. In this model, called Nested BSP, processor groups can be divided and processors in a group synchronize through group dependent collective operations generalizing the concept of barrier synchronization. A classification of problems and algorithms attending to their parallel input-output distribution is provided. For one of these problem classes, the called common-common class, we present a general strategy to derive efficient parallel algorithms. Algorithms belonging to this class allow the arbitrary division of the processor subsets, easing the opportunities of the underlying BSP software to divide the network in independent sub networks, minimizing the impact of the traffic in the rest of the network in the predicted cost. The expressiveness of the model is exemplified through three divide and conquer programs. The computational results for these programs in six high performance supercomputers show both the accuracy of the model and the optimality of the speedups for the class of problems considered.
{"title":"Groups in bulk synchronous parallel computing","authors":"J. González, C. León, F. Piccoli, A. M. Printista, J. R. García, C. Rodríguez, F. D. Sande","doi":"10.1109/EMPDP.2000.823418","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823418","url":null,"abstract":"An extension to the Bulk Synchronous Parallel Model (BSP) to allow the use of asynchronous BSP groups of processors is presented. In this model, called Nested BSP, processor groups can be divided and processors in a group synchronize through group dependent collective operations generalizing the concept of barrier synchronization. A classification of problems and algorithms attending to their parallel input-output distribution is provided. For one of these problem classes, the called common-common class, we present a general strategy to derive efficient parallel algorithms. Algorithms belonging to this class allow the arbitrary division of the processor subsets, easing the opportunities of the underlying BSP software to divide the network in independent sub networks, minimizing the impact of the traffic in the rest of the network in the predicted cost. The expressiveness of the model is exemplified through three divide and conquer programs. The computational results for these programs in six high performance supercomputers show both the accuracy of the model and the optimality of the speedups for the class of problems considered.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129394391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823432
Rocco Aversa, B. D. Martino, N. Mazzocca, Umberto Villano
In this paper we present the application of an approach for the performance prediction of message passing programs, to a PVM code implementing an iterative solver based on the Successive OverRelaxation method. The approach, based on the integration of static program analysis and simulation techniques, is aimed at significantly speeding up the time needed for simulating the execution of a message passing program. We show how the proposed technique can provide, in a reasonable elaboration time, the user for a characterization of iterative regular programs as the proposed one, in terms of idle-, cpu-, communication and synchronization time in Heterogeneous and Network Computing environments.
{"title":"A performance simulation technique for distributed programs: application to an SOR iterative solver","authors":"Rocco Aversa, B. D. Martino, N. Mazzocca, Umberto Villano","doi":"10.1109/EMPDP.2000.823432","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823432","url":null,"abstract":"In this paper we present the application of an approach for the performance prediction of message passing programs, to a PVM code implementing an iterative solver based on the Successive OverRelaxation method. The approach, based on the integration of static program analysis and simulation techniques, is aimed at significantly speeding up the time needed for simulating the execution of a message passing program. We show how the proposed technique can provide, in a reasonable elaboration time, the user for a characterization of iterative regular programs as the proposed one, in terms of idle-, cpu-, communication and synchronization time in Heterogeneous and Network Computing environments.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117231177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-01-19DOI: 10.1109/EMPDP.2000.823410
E. Tambouris, P. V. Santen
The generic fixed-value efficiency (FVE) method is proposed in order to study the scalability of parallel algorithms with multiple components of work. The generic FVE method is based on the isoefficiency method. Unlike isoefficiency, however, this method may be applied to parallel algorithm-machine combinations (parallel systems) where the relationship between the total work and its components is not predetermined by the decomposition method or by any other factor. The objective of the method is to derive the relationships between the total work and its components in order for the efficiency of the parallel system to be preserved. The use of the method is demonstrated by analysing the impact of the sparsity of the input data on the scalability of a static state estimator for power systems.
{"title":"Scalability analysis of parallel systems with multiple components of work","authors":"E. Tambouris, P. V. Santen","doi":"10.1109/EMPDP.2000.823410","DOIUrl":"https://doi.org/10.1109/EMPDP.2000.823410","url":null,"abstract":"The generic fixed-value efficiency (FVE) method is proposed in order to study the scalability of parallel algorithms with multiple components of work. The generic FVE method is based on the isoefficiency method. Unlike isoefficiency, however, this method may be applied to parallel algorithm-machine combinations (parallel systems) where the relationship between the total work and its components is not predetermined by the decomposition method or by any other factor. The objective of the method is to derive the relationships between the total work and its components in order for the efficiency of the parallel system to be preserved. The use of the method is demonstrated by analysing the impact of the sparsity of the input data on the scalability of a static state estimator for power systems.","PeriodicalId":128020,"journal":{"name":"Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122657529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}