Pub Date : 2001-02-07DOI: 10.1109/EMPDP.2001.905044
L. Courtrai, Y. Mahéo, Frédéric Raimbault
Local high performance networks availability already makes workstations clusters a serious alternative for parallel computing. However a high level and effective programming language for such architecture is still missing. Recent works show the interest in Java for cluster programming. One of the main issues is to handle efficiently the communication of objects to really take advantage of the network speed. The paper presents an alternative to the standard serialization process through the proposal of a Java object communication library. Object allocation is controlled in such a way that the transfer of objects between two nodes comes to a direct memory to memory dump. We show how specific allocation mechanisms can cooperate with a Java Virtual Machine so that fast transfers of graphs of objects can be achieved. Experimental results are given for basic operations and for a genetic programming application; they demonstrate a dramatic change in the transfer speed.
{"title":"Java objects communication on a high performance network","authors":"L. Courtrai, Y. Mahéo, Frédéric Raimbault","doi":"10.1109/EMPDP.2001.905044","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905044","url":null,"abstract":"Local high performance networks availability already makes workstations clusters a serious alternative for parallel computing. However a high level and effective programming language for such architecture is still missing. Recent works show the interest in Java for cluster programming. One of the main issues is to handle efficiently the communication of objects to really take advantage of the network speed. The paper presents an alternative to the standard serialization process through the proposal of a Java object communication library. Object allocation is controlled in such a way that the transfer of objects between two nodes comes to a direct memory to memory dump. We show how specific allocation mechanisms can cooperate with a Java Virtual Machine so that fast transfers of graphs of objects can be achieved. Experimental results are given for basic operations and for a genetic programming application; they demonstrate a dramatic change in the transfer speed.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115509872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-02-07DOI: 10.1109/EMPDP.2001.905049
D. Talia
Cellular automata is a nature inspired parallel processing model. It has been proposed several years ago by J. Von Neumann to simulate complex dynamical processes. In the past two decades several models of cellular automata that differ from the original one proposed by Von Neumann have been defined for modeling real-world systems and phenomena. This paper describes the design and implementation of standard and nonstandard parallel cellular automata in the CARPET language. CARPET is a cellular automata based language that has been implemented on MIMD parallel computers. The language is specifically designed for programming cellular computations supporting concise and efficient coding of parallel cellular algorithms. The paper analyzes the main features of the language and describes as they can be exploited to implement different cellular automata on parallel computers, starting from the standard model to its modifications and generalizations. Inhomogeneous, partitioned, asynchronous, and probabilistic cellular automata programmed in CARPET are presented.
{"title":"Implementing standard and nonstandard parallel cellular automata in CARPET","authors":"D. Talia","doi":"10.1109/EMPDP.2001.905049","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905049","url":null,"abstract":"Cellular automata is a nature inspired parallel processing model. It has been proposed several years ago by J. Von Neumann to simulate complex dynamical processes. In the past two decades several models of cellular automata that differ from the original one proposed by Von Neumann have been defined for modeling real-world systems and phenomena. This paper describes the design and implementation of standard and nonstandard parallel cellular automata in the CARPET language. CARPET is a cellular automata based language that has been implemented on MIMD parallel computers. The language is specifically designed for programming cellular computations supporting concise and efficient coding of parallel cellular algorithms. The paper analyzes the main features of the language and describes as they can be exploited to implement different cellular automata on parallel computers, starting from the standard model to its modifications and generalizations. Inhomogeneous, partitioned, asynchronous, and probabilistic cellular automata programmed in CARPET are presented.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129608571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-02-07DOI: 10.1109/EMPDP.2001.905063
J. González, C. León, J. R. García, C. Rodríguez, F. D. Sande, F. Piccoli, A. M. Printista
The BSP model can be extended with a zero cost synchronization mechanism that can be used when the numbers of messages due to receive is known. This mechanism, usually known as "oblivious synchronization", implies that different processors can be in different supersteps at the same time. An unwanted consequence of these software improvements is a loss of accuracy in prediction. This paper proposes an extension of the BSP complexity model to deal with oblivious barriers and shows its accuracy.
{"title":"Predicting the time of oblivious programs","authors":"J. González, C. León, J. R. García, C. Rodríguez, F. D. Sande, F. Piccoli, A. M. Printista","doi":"10.1109/EMPDP.2001.905063","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905063","url":null,"abstract":"The BSP model can be extended with a zero cost synchronization mechanism that can be used when the numbers of messages due to receive is known. This mechanism, usually known as \"oblivious synchronization\", implies that different processors can be in different supersteps at the same time. An unwanted consequence of these software improvements is a loss of accuracy in prediction. This paper proposes an extension of the BSP complexity model to deal with oblivious barriers and shows its accuracy.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124825335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-02-07DOI: 10.1109/EMPDP.2001.905020
P. Lombard, Y. Denneulin
Clusters of standard components are becoming a viable alternative to traditional supercomputers. The typical architecture of these clusters is standard PCs connected by a high performance network. Another rising interest is in the use of idle computers for computation. The operating system used on this kind of platform is generally Linux because it is stable, and flexible: it can be studied, modified and tuned. When using a parallel architecture two important points are fault tolerance and load-balancing of activities scheduling. This is especially true in the context of clusters shared between users and applications and that relies on hardware not as robust as dedicated parallel machines. To provide these two services it is necessary to have a mechanism to stop, freeze, activities in a preemptive manner and, of course, one to restore them in the state they were when frozen. In this paper we present our proposal to modify the LinuxThreads library to provide this service. We do an analysis of how this library works and also give some performance results of the modified library.
{"title":"A freeze/unfreeze mechanism for the LinuxThreads library","authors":"P. Lombard, Y. Denneulin","doi":"10.1109/EMPDP.2001.905020","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905020","url":null,"abstract":"Clusters of standard components are becoming a viable alternative to traditional supercomputers. The typical architecture of these clusters is standard PCs connected by a high performance network. Another rising interest is in the use of idle computers for computation. The operating system used on this kind of platform is generally Linux because it is stable, and flexible: it can be studied, modified and tuned. When using a parallel architecture two important points are fault tolerance and load-balancing of activities scheduling. This is especially true in the context of clusters shared between users and applications and that relies on hardware not as robust as dedicated parallel machines. To provide these two services it is necessary to have a mechanism to stop, freeze, activities in a preemptive manner and, of course, one to restore them in the state they were when frozen. In this paper we present our proposal to modify the LinuxThreads library to provide this service. We do an analysis of how this library works and also give some performance results of the modified library.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127353551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-02-07DOI: 10.1109/EMPDP.2001.905014
M. Avvenuti, Alessio Vecchio
Programming paradigms based on object mobility, such as mobile agents, can greatly contribute in designing distributed applications, but also introduce issues which are not present when objects are statically bound to their execution environments. Auxiliary mechanisms are necessary in order to allow an application to control mobile objects as well as to provide mobile agents with the capability of interacting with each other despite of mobility. The work described in this paper deals with how to build a mobile objects system based on the Java distributed object model. In particular we describe how to take advantage of the Java RMI's distributed garbage collector to implement an effective remote reference updating scheme, necessary, to support object interaction even in the presence of mobility.
{"title":"Supporting remote reference updating through garbage collection in a mobile object system","authors":"M. Avvenuti, Alessio Vecchio","doi":"10.1109/EMPDP.2001.905014","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905014","url":null,"abstract":"Programming paradigms based on object mobility, such as mobile agents, can greatly contribute in designing distributed applications, but also introduce issues which are not present when objects are statically bound to their execution environments. Auxiliary mechanisms are necessary in order to allow an application to control mobile objects as well as to provide mobile agents with the capability of interacting with each other despite of mobility. The work described in this paper deals with how to build a mobile objects system based on the Java distributed object model. In particular we describe how to take advantage of the Java RMI's distributed garbage collector to implement an effective remote reference updating scheme, necessary, to support object interaction even in the presence of mobility.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126094716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-02-07DOI: 10.1109/EMPDP.2001.905074
K. Großpietsch, J. Büddefeld
A "smart memory" approach is presented, i.e. the new architecture is achieved by extending the functionality of a conventional RAM structure. The architecture additionally contains two innovative features: To every word cell of w bits, a small q bits wide ALU is associated; and by means of extending the memory decoder, multiple access to certain sets of word cells within the memory as well as activation of their ALUs is possible. It is shown that based on these features, the standard numerical problem of adding up the m components of a vector of dimension m, in the new architecture can be carried out in a time complexity of O(square root(m)). For the execution of artificial neural nets, especially the on-line recognition of patterns mainly depends on the time-efficient efficient execution of weighted sums. It is shown that in our architecture, these weighted sums can be computed quite efficiently. The computation time is highly superior to the time complexity on sequential von Neumann machines. In addition, we show that if requested, the training mode of a neural net can also be significantly be speeded up. This is achieved by means of a simple crossbar switch which can be modularly added to the array of memory chips.
{"title":"A smart memory architecture for the efficient support of artificial neural nets","authors":"K. Großpietsch, J. Büddefeld","doi":"10.1109/EMPDP.2001.905074","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905074","url":null,"abstract":"A \"smart memory\" approach is presented, i.e. the new architecture is achieved by extending the functionality of a conventional RAM structure. The architecture additionally contains two innovative features: To every word cell of w bits, a small q bits wide ALU is associated; and by means of extending the memory decoder, multiple access to certain sets of word cells within the memory as well as activation of their ALUs is possible. It is shown that based on these features, the standard numerical problem of adding up the m components of a vector of dimension m, in the new architecture can be carried out in a time complexity of O(square root(m)). For the execution of artificial neural nets, especially the on-line recognition of patterns mainly depends on the time-efficient efficient execution of weighted sums. It is shown that in our architecture, these weighted sums can be computed quite efficiently. The computation time is highly superior to the time complexity on sequential von Neumann machines. In addition, we show that if requested, the training mode of a neural net can also be significantly be speeded up. This is achieved by means of a simple crossbar switch which can be modularly added to the array of memory chips.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"269 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116837376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-02-07DOI: 10.1109/EMPDP.2001.905031
Pierre Boulet, J. Dekeyser, Jean-Luc Levaire, P. Marquet, J. Soula, A. Demeure
Matrix manipulation programs are easily developed using a visual language. For signal processing, a graph of tasks operates on arrays. Each task iterates the same code on different patterns tilling these arrays. In this case visual specifications of dependencies between the pattern elements are enough to define an application. From the ARRAY-OL language developed by Thomson Marconi Sonar, we propose a graphical environment, GASPARD, dedicated to the data-parallel paradigm. Only elementary SPMD tasks are textual. A full environment has been implemented; it includes a graphical editor, a code transformer and a code generator for SMP computers.
{"title":"Visual data-parallel programming for signal processing applications","authors":"Pierre Boulet, J. Dekeyser, Jean-Luc Levaire, P. Marquet, J. Soula, A. Demeure","doi":"10.1109/EMPDP.2001.905031","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905031","url":null,"abstract":"Matrix manipulation programs are easily developed using a visual language. For signal processing, a graph of tasks operates on arrays. Each task iterates the same code on different patterns tilling these arrays. In this case visual specifications of dependencies between the pattern elements are enough to define an application. From the ARRAY-OL language developed by Thomson Marconi Sonar, we propose a graphical environment, GASPARD, dedicated to the data-parallel paradigm. Only elementary SPMD tasks are textual. A full environment has been implemented; it includes a graphical editor, a code transformer and a code generator for SMP computers.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114914727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-02-01DOI: 10.1109/EMPDP.2001.905067
Magdalena Sujecka, B. Wiszniewski
The paper describes an idea and initial experience gained in applying fault-tolerance mechanisms, namely object replication, for on-line debugging of remote objects of distributed software applications. It examines available object-oriented platforms supporting fault-tolerance and mechanisms enabling implementation of remote object debugging. It also reviews this concept from the perspective of the coming CORBA 3 standard.
{"title":"Remote debugging of CORBA objects","authors":"Magdalena Sujecka, B. Wiszniewski","doi":"10.1109/EMPDP.2001.905067","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905067","url":null,"abstract":"The paper describes an idea and initial experience gained in applying fault-tolerance mechanisms, namely object replication, for on-line debugging of remote objects of distributed software applications. It examines available object-oriented platforms supporting fault-tolerance and mechanisms enabling implementation of remote object debugging. It also reviews this concept from the perspective of the coming CORBA 3 standard.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126305810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-02-01DOI: 10.1109/EMPDP.2001.905075
H. Stockinger, Kurt Stockinger, E. Schikuta, I. Willers
Large, Petabyte-scale data stores need detailed design considerations about distributing and replicating particular parts of the data store in a cost-effective way. Technical issues need to be analysed and, based on these constraints, an optimisation problem can be formulated. In this paper we provide a novel cost model for building a world-wide distributed Petabyte data store which will be in place starting from 2005 at CERN and its collaborating, world-wide distributed institutes. We elaborate on a framework for assessing potential system costs and influences which are essential for the design of the data store.
{"title":"Towards a cost model for distributed and replicated data stores","authors":"H. Stockinger, Kurt Stockinger, E. Schikuta, I. Willers","doi":"10.1109/EMPDP.2001.905075","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905075","url":null,"abstract":"Large, Petabyte-scale data stores need detailed design considerations about distributing and replicating particular parts of the data store in a cost-effective way. Technical issues need to be analysed and, based on these constraints, an optimisation problem can be formulated. In this paper we provide a novel cost model for building a world-wide distributed Petabyte data store which will be in place starting from 2005 at CERN and its collaborating, world-wide distributed institutes. We elaborate on a framework for assessing potential system costs and influences which are essential for the design of the data store.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114847010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-02-01DOI: 10.1109/EMPDP.2001.905064
E. Casalicchio, Salvatore Tucci
Multiprocessor-based servers are often used for building popular Web sites which have to guarantee an acceptable Quality of Web Service. In common multi-node systems, namely Web server farms, a Web switch (say, Dispatcher) routes client requests among the server nodes. This architecture resembles a traditional cluster in which a global scheduler dispatches parallel applications among the server nodes. The main difference is that the load reaching Web server farms tends to occur in waves with intervals of heavy peaks. These heavy-tailed characteristics have motivated the use of policies based on dynamic state information for global scheduling in Web server farms. This paper presents an accurate comparison between static and dynamic policies for different classes of Web sites. The goal is to identify main features of architectures and load management algorithms that guarantee scalable Web services. We verify that a Web farm with a Dispatcher with full control on client connections is a very robust architecture. Indeed, we demonstrate that if the Web sire provides only HTML pages or simple database searches, the Dispatcher does not need to use sophisticated scheduling algorithms even if the load occurs in heavy bursts. Dynamic scheduling policies appears to be necessaly for scalability only when most requests are for Web services of three or more orders of magnitude higher than providing HTML pages with some embedded objects.
{"title":"Static and dynamic scheduling algorithms for scalable Web server farm","authors":"E. Casalicchio, Salvatore Tucci","doi":"10.1109/EMPDP.2001.905064","DOIUrl":"https://doi.org/10.1109/EMPDP.2001.905064","url":null,"abstract":"Multiprocessor-based servers are often used for building popular Web sites which have to guarantee an acceptable Quality of Web Service. In common multi-node systems, namely Web server farms, a Web switch (say, Dispatcher) routes client requests among the server nodes. This architecture resembles a traditional cluster in which a global scheduler dispatches parallel applications among the server nodes. The main difference is that the load reaching Web server farms tends to occur in waves with intervals of heavy peaks. These heavy-tailed characteristics have motivated the use of policies based on dynamic state information for global scheduling in Web server farms. This paper presents an accurate comparison between static and dynamic policies for different classes of Web sites. The goal is to identify main features of architectures and load management algorithms that guarantee scalable Web services. We verify that a Web farm with a Dispatcher with full control on client connections is a very robust architecture. Indeed, we demonstrate that if the Web sire provides only HTML pages or simple database searches, the Dispatcher does not need to use sophisticated scheduling algorithms even if the load occurs in heavy bursts. Dynamic scheduling policies appears to be necessaly for scalability only when most requests are for Web services of three or more orders of magnitude higher than providing HTML pages with some embedded objects.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126567610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}