Pub Date : 2001-02-07DOI: 10.1109/EMPDP.2001.905041
A. Girault
Eliminating redundant messages in distributed programs leads to the reduction of communication overhead, and thus to the improvement of the overall performances of the distributed program. Therefore a lot of work has been done recently to achieve this goal. We present in this paper an algorithm for eliminating redundant valued messages in parallel programs that have been distributed automatically. This algorithm works on program whose control flow is as general as possible, i.e., contains gotos. Precisely, the control flow is a finite deterministic automaton with a DAG of actions in each state. our algorithm, proceeds in two passes: First a global data-flow analysis which computes, for each state of the automaton, the set of distant variables that are known at the beginning of the state. Then a local elimination which removes redundant messages locally in each state of the automaton. We present the algorithms along with an example.
{"title":"Elimination of redundant messages with a two-pass static analysis algorithm","authors":"A. Girault","doi":"10.1109/EMPDP.2001.905041","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905041","url":null,"abstract":"Eliminating redundant messages in distributed programs leads to the reduction of communication overhead, and thus to the improvement of the overall performances of the distributed program. Therefore a lot of work has been done recently to achieve this goal. We present in this paper an algorithm for eliminating redundant valued messages in parallel programs that have been distributed automatically. This algorithm works on program whose control flow is as general as possible, i.e., contains gotos. Precisely, the control flow is a finite deterministic automaton with a DAG of actions in each state. our algorithm, proceeds in two passes: First a global data-flow analysis which computes, for each state of the automaton, the set of distant variables that are known at the beginning of the state. Then a local elimination which removes redundant messages locally in each state of the automaton. We present the algorithms along with an example.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122384033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-02-07DOI: 10.1109/EMPDP.2001.905069
C. Dima, A. Girault, C. Lavarenne, Y. Sorel
We address the problem of off-line fault tolerant scheduling of an algorithm onto a multiprocessor architecture with distributed memory and provide a generic algorithm which solves this problem. We take into account two kinds of failures: fail-silent and omission. The basic technique we use is the replication of operations and data communications. We then discuss the principles which govern the execution of schedulings with replication under the state-machine and the primary/backup arbitrations between replicas. We also show how to compute the execution date for each operation and the timeouts which are used for detecting failures. We end with a heuristic which, using this calculus, computes a possibly non optimal scheduling by finding plain schedulings for each failure pattern and then combining them into a scheduling with replication.
{"title":"Off-line real-time fault-tolerant scheduling","authors":"C. Dima, A. Girault, C. Lavarenne, Y. Sorel","doi":"10.1109/EMPDP.2001.905069","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905069","url":null,"abstract":"We address the problem of off-line fault tolerant scheduling of an algorithm onto a multiprocessor architecture with distributed memory and provide a generic algorithm which solves this problem. We take into account two kinds of failures: fail-silent and omission. The basic technique we use is the replication of operations and data communications. We then discuss the principles which govern the execution of schedulings with replication under the state-machine and the primary/backup arbitrations between replicas. We also show how to compute the execution date for each operation and the timeouts which are used for detecting failures. We end with a heuristic which, using this calculus, computes a possibly non optimal scheduling by finding plain schedulings for each failure pattern and then combining them into a scheduling with replication.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115058136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-02-07DOI: 10.1109/EMPDP.2001.905052
D. Semé, J. Myoupo
This paper presents BSR-parallel algorithms for three geometrical problems: point location, convex hull and smallest enclosing rectangle. These problems are solved in constant time using the BSR model introduced by Akl and Guenther in 1989. The first algorithm uses O(N) processors (N is the number of edges of the polygon R). The second uses O(N'/sup 2/) processors (N' is the number of points) and the third one uses O(N'/sup 2/) processors (it need the convex hull) to solve the smallest enclosing rectangle problem. These new results suggest that many other geometrical problems can be solved in constant time using the BSR model.
{"title":"Efficient BSR-based parallel algorithms for geometrical problems","authors":"D. Semé, J. Myoupo","doi":"10.1109/EMPDP.2001.905052","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905052","url":null,"abstract":"This paper presents BSR-parallel algorithms for three geometrical problems: point location, convex hull and smallest enclosing rectangle. These problems are solved in constant time using the BSR model introduced by Akl and Guenther in 1989. The first algorithm uses O(N) processors (N is the number of edges of the polygon R). The second uses O(N'/sup 2/) processors (N' is the number of points) and the third one uses O(N'/sup 2/) processors (it need the convex hull) to solve the smallest enclosing rectangle problem. These new results suggest that many other geometrical problems can be solved in constant time using the BSR model.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129529284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-02-07DOI: 10.1109/EMPDP.2001.905043
C. Gennaro, R. Perego, S. Orlando
Although HPF allows programmers to express data-parallel computations in a portable, high-level way, it is widely accepted that many important parallel applications cannot be efficiently implemented following a pure data-parallel paradigm. For these applications, rather than having a single data-parallel program, it is more profitable to subdivide the whole computation into several data-parallel pieces, where the various pieces run concurrently and co-operate, thus exploiting task parallelism. This paper discusses the integration of HPF with SkIE, a skeleton based coordination language implemented on top of MPI (Message Passing Interface), which permits to describe complex computational parallel structures. We show how HPF can be used inside common forms of parallelism, e.g. pipeline and processor farms, and we present experimental results regarding a sample application.
{"title":"Integrating HPF in a skeleton based parallel language","authors":"C. Gennaro, R. Perego, S. Orlando","doi":"10.1109/EMPDP.2001.905043","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905043","url":null,"abstract":"Although HPF allows programmers to express data-parallel computations in a portable, high-level way, it is widely accepted that many important parallel applications cannot be efficiently implemented following a pure data-parallel paradigm. For these applications, rather than having a single data-parallel program, it is more profitable to subdivide the whole computation into several data-parallel pieces, where the various pieces run concurrently and co-operate, thus exploiting task parallelism. This paper discusses the integration of HPF with SkIE, a skeleton based coordination language implemented on top of MPI (Message Passing Interface), which permits to describe complex computational parallel structures. We show how HPF can be used inside common forms of parallelism, e.g. pipeline and processor farms, and we present experimental results regarding a sample application.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125874299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-02-07DOI: 10.1109/EMPDP.2001.905006
D. Tavangarian
In this paper, recent architectural approaches and tools for local-area and wide-area computing using clusters of servers, workstations, and PCs as multicomputers (i.e. parallel computing in workstation cluster) are classified and described. The goal of such systems is to concentrate available computing resources to solve computing problems. A special focus of this contribution is a description of recent research in the field of cost-efficient parallel computing with standard component multicomputer systems, concentrating on locally organized clusters for local-area computing and on wide-area multiclusters (hyperclusters or clusters of clusters) for wide-area computing. Selected examples are given demonstrating the improvement through high-speed interconnection networks and optimized protocol system architectures in local-area systems and optimized organizations in wide-area systems.
{"title":"Local-area and wide-area computing: architectures and tools","authors":"D. Tavangarian","doi":"10.1109/EMPDP.2001.905006","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905006","url":null,"abstract":"In this paper, recent architectural approaches and tools for local-area and wide-area computing using clusters of servers, workstations, and PCs as multicomputers (i.e. parallel computing in workstation cluster) are classified and described. The goal of such systems is to concentrate available computing resources to solve computing problems. A special focus of this contribution is a description of recent research in the field of cost-efficient parallel computing with standard component multicomputer systems, concentrating on locally organized clusters for local-area computing and on wide-area multiclusters (hyperclusters or clusters of clusters) for wide-area computing. Selected examples are given demonstrating the improvement through high-speed interconnection networks and optimized protocol system architectures in local-area systems and optimized organizations in wide-area systems.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128839727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-02-07DOI: 10.1109/EMPDP.2001.905047
P. Berenbrink, A. Brinkmann, C. Scheideler
In this paper we present a simulation environment for storage area networks called SIMLAB. SIMLAB is a part of the PRESTO project, which is a joint project of the Electrical Engineering Department and the Computer Science Department of the Paderborn University. The aim of the PRESTO project is to construct a scalable and resource-efficient storage network that can support the real-time delivery of data. SIMLAB has been implemented to aid the development and verification of distributed algorithms for this storage network. However, it has been designed in such a way that it can also be used for the simulation of many other types of networking problems. SIMLAB is based on C++ and common libraries and input/output formats, which ensures that SIMLAB can be used on many different platforms. We therefore expect SIMLAB to be useful also for other people working on similar problems.
{"title":"SIMLAB-a simulation environment for storage area networks","authors":"P. Berenbrink, A. Brinkmann, C. Scheideler","doi":"10.1109/EMPDP.2001.905047","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905047","url":null,"abstract":"In this paper we present a simulation environment for storage area networks called SIMLAB. SIMLAB is a part of the PRESTO project, which is a joint project of the Electrical Engineering Department and the Computer Science Department of the Paderborn University. The aim of the PRESTO project is to construct a scalable and resource-efficient storage network that can support the real-time delivery of data. SIMLAB has been implemented to aid the development and verification of distributed algorithms for this storage network. However, it has been designed in such a way that it can also be used for the simulation of many other types of networking problems. SIMLAB is based on C++ and common libraries and input/output formats, which ensures that SIMLAB can be used on many different platforms. We therefore expect SIMLAB to be useful also for other people working on similar problems.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123159845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-02-07DOI: 10.1109/EMPDP.2001.904960
Xavier Molero, F. Silla, V. Santonja, J. Duato
Networks of workstations (NOWs) are becoming an increasingly popular alternative to parallel computers for those applications with high needs of resources such as memory capacity and input/output storage space, and also for small scale parallel computing. Usually, the software messaging layers in these systems become a bottleneck due to the overhead they introduce. Some proposals like FM and BIP considerably reduce this overhead by splitting long messages into several packets. These proposals have been shown to improve communication performance. However, the effect of message packetization on the network interconnects has not been analyzed yet. In this paper we examine the effect of message packetization from the point of view of the interconnection network in the context of bimodal traffic. Two different routing algorithms have been considered: up*/down* and minimal adaptive routing. Our study shows that when the up */down* routing algorithm is used, message packetization dramatically increases latency and reduces throughput for both long and short messages. On the other hand, if minimal adaptive routing is used, short messages could benefit from message packetization, but at the cost of increasing latency for long messages. In any case, network throughput is considerably reduced.
{"title":"On the impact of message packetization in networks of workstations with irregular topology","authors":"Xavier Molero, F. Silla, V. Santonja, J. Duato","doi":"10.1109/EMPDP.2001.904960","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.904960","url":null,"abstract":"Networks of workstations (NOWs) are becoming an increasingly popular alternative to parallel computers for those applications with high needs of resources such as memory capacity and input/output storage space, and also for small scale parallel computing. Usually, the software messaging layers in these systems become a bottleneck due to the overhead they introduce. Some proposals like FM and BIP considerably reduce this overhead by splitting long messages into several packets. These proposals have been shown to improve communication performance. However, the effect of message packetization on the network interconnects has not been analyzed yet. In this paper we examine the effect of message packetization from the point of view of the interconnection network in the context of bimodal traffic. Two different routing algorithms have been considered: up*/down* and minimal adaptive routing. Our study shows that when the up */down* routing algorithm is used, message packetization dramatically increases latency and reduces throughput for both long and short messages. On the other hand, if minimal adaptive routing is used, short messages could benefit from message packetization, but at the cost of increasing latency for long messages. In any case, network throughput is considerably reduced.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132885297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-02-07DOI: 10.1109/EMPDP.2001.905076
G. Folino, G. Spezzano
In this paper we present a performance model to analyse the scalability and predict the performance of cellular programs developed by the CAMELot system. CAMELot is a problem solving environment that uses the cellular automata model for modelling and simulating dynamic complex phenomena. The environment supports CARPET, a purpose-built language for programming and steering cellular applications. The performance model proposed is based on the isoefficiency method. The isoefficiency is a scalability measure that determines whether a parallel system can preserve its efficiency by increasing the problem size as the number of processors is scaled. By isoefficiency analysis we can test a program's performance on a few processors and then predict its performance on a larger number of processors. It also lets us study system behavior when other hardware parameters, such as processor and communication speeds change. Scalability prediction examples for two-dimensional and three-dimensional cellular programs on a Meiko CS-2 parallel machine are given.
{"title":"Predictability of cellular programs implemented with CAMELot","authors":"G. Folino, G. Spezzano","doi":"10.1109/EMPDP.2001.905076","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905076","url":null,"abstract":"In this paper we present a performance model to analyse the scalability and predict the performance of cellular programs developed by the CAMELot system. CAMELot is a problem solving environment that uses the cellular automata model for modelling and simulating dynamic complex phenomena. The environment supports CARPET, a purpose-built language for programming and steering cellular applications. The performance model proposed is based on the isoefficiency method. The isoefficiency is a scalability measure that determines whether a parallel system can preserve its efficiency by increasing the problem size as the number of processors is scaled. By isoefficiency analysis we can test a program's performance on a few processors and then predict its performance on a larger number of processors. It also lets us study system behavior when other hardware parameters, such as processor and communication speeds change. Scalability prediction examples for two-dimensional and three-dimensional cellular programs on a Meiko CS-2 parallel machine are given.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133553510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-02-07DOI: 10.1109/EMPDP.2001.905018
E. Artiaga, Marisa Gil
Exokernel-based systems provide efficient access to the system actual hardware resources. Parallel applications can take advantage of such kind of access and adapt to the actual resources available to increase performance. In this paper we present an extension to allow multithreaded applications to run on an Intel-based exokernel system. For this purpose, we have ported a user-level threads package to such environment. Our final goal is having a multiprocessor exokernel version to be able to run parallel applications on top of it. We use the exokernel interface to have access to the physical execution resources and we have designed the lower layer of the multithreading library to use them.
{"title":"Running multithreaded applications in exokernel-based systems: porting CThreads to Xok","authors":"E. Artiaga, Marisa Gil","doi":"10.1109/EMPDP.2001.905018","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905018","url":null,"abstract":"Exokernel-based systems provide efficient access to the system actual hardware resources. Parallel applications can take advantage of such kind of access and adapt to the actual resources available to increase performance. In this paper we present an extension to allow multithreaded applications to run on an Intel-based exokernel system. For this purpose, we have ported a user-level threads package to such environment. Our final goal is having a multiprocessor exokernel version to be able to run parallel applications on top of it. We use the exokernel interface to have access to the physical execution resources and we have designed the lower layer of the multithreading library to use them.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129036064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-02-07DOI: 10.1109/EMPDP.2001.905059
A. Ferscha
Workspace awareness as the "... up-to-the-moment understanding of another person's interaction with a shared workspace" involves knowledge on who is working in the workspace, where individuals are working, what they are doing or going to do, how and when they are executing their work, and what their motivation is for doing it (why). Traditional awareness systems use dynamic user behavior data as collected while monitoring events from I/O devices (keyboard, mouse, touchscreen) at the interface to laser. To preserve and maintain a more intuitive fidelity of awareness, we extend our workspace awareness system TEAMSPACE to collect and exploit awareness information from the users physical activities in his workspace, like hand gesture and body movement, using position and orientation tracking technologies. Thus user activities aside the interaction with desktop computing facilities are seamlessly integrated into a shared virtual workspace opening a whole new dimension of awareness abilities.
{"title":"Integrating pervasive information acquisition to enhance workspace awareness","authors":"A. Ferscha","doi":"10.1109/EMPDP.2001.905059","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905059","url":null,"abstract":"Workspace awareness as the \"... up-to-the-moment understanding of another person's interaction with a shared workspace\" involves knowledge on who is working in the workspace, where individuals are working, what they are doing or going to do, how and when they are executing their work, and what their motivation is for doing it (why). Traditional awareness systems use dynamic user behavior data as collected while monitoring events from I/O devices (keyboard, mouse, touchscreen) at the interface to laser. To preserve and maintain a more intuitive fidelity of awareness, we extend our workspace awareness system TEAMSPACE to collect and exploit awareness information from the users physical activities in his workspace, like hand gesture and body movement, using position and orientation tracking technologies. Thus user activities aside the interaction with desktop computing facilities are seamlessly integrated into a shared virtual workspace opening a whole new dimension of awareness abilities.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126367875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}