Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199430
William H. Bell, D. G. Cameron, R. Carvajal-Schiaffino, A. P. Millar, K. Stockinger, F. Zini
Optimising the use of Grid resources is critical for users to effectively exploit a Data Grid. Data replication is considered a major technique for reducing data access cost to Grid jobs. This paper evaluates a novel replication strategy, based on an economic model, that optimises both the selection of replicas for running jobs and the dynamic creation of replicas in Grid sites. In our model, optimisation agents are located on Grid sites and use an auction protocol for selecting the optimal replica of a data file and a prediction function to make informed decisions about local data replication. We evaluate our replication strategy with OptorSim, a Data Grid simulator developed by the authors. The experiments show that our proposed strategy results in a notable improvement over traditional replication strategies in a Grid environment.
{"title":"Evaluation of an economy-based file replication strategy for a data grid","authors":"William H. Bell, D. G. Cameron, R. Carvajal-Schiaffino, A. P. Millar, K. Stockinger, F. Zini","doi":"10.1109/CCGRID.2003.1199430","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199430","url":null,"abstract":"Optimising the use of Grid resources is critical for users to effectively exploit a Data Grid. Data replication is considered a major technique for reducing data access cost to Grid jobs. This paper evaluates a novel replication strategy, based on an economic model, that optimises both the selection of replicas for running jobs and the dynamic creation of replicas in Grid sites. In our model, optimisation agents are located on Grid sites and use an auction protocol for selecting the optimal replica of a data file and a prediction function to make informed decisions about local data replication. We evaluate our replication strategy with OptorSim, a Data Grid simulator developed by the authors. The experiments show that our proposed strategy results in a notable improvement over traditional replication strategies in a Grid environment.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123085487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199436
Koji Segawa, O. Tatebe, Yuetsu Kodama, T. Kudoh, T. Shimizu
This paper discusses the design and implementation of a cluster file system, called PVFS-PM, on the SCore cluster system software. This is the first attempt to implement a cluster file system on the SCore system. It is based on the PVFS cluster file system but replaces TCP with the PMv2 communication library supported by SCore to provide a scalable, high-performance cluster file system. PVFS-PM improves the performance by factors of 1.07 and 1.93 for writing and reading, respectively, with 8 I/O nodes, compared with the original PVFS on TCP on a Gigabit Ethernet-connected SCore cluster.
{"title":"Design and implementation of PVFS-PM: a cluster file system on SCore","authors":"Koji Segawa, O. Tatebe, Yuetsu Kodama, T. Kudoh, T. Shimizu","doi":"10.1109/CCGRID.2003.1199436","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199436","url":null,"abstract":"This paper discusses the design and implementation of a cluster file system, called PVFS-PM, on the SCore cluster system software. This is the first attempt to implement a cluster file system on the SCore system. It is based on the PVFS cluster file system but replaces TCP with the PMv2 communication library supported by SCore to provide a scalable, high-performance cluster file system. PVFS-PM improves the performance by factors of 1.07 and 1.93 for writing and reading, respectively, with 8 I/O nodes, compared with the original PVFS on TCP on a Gigabit Ethernet-connected SCore cluster.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123493364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199390
Shoji Ogura, S. Matsuoka, H. Nakada
High-performance peer-to-peer transfer between clusters will be fundamental technology base for various Grid middleware, such as large-scale data transfer in DataGrid settings, or collective communication in Grid-wide MPIs. There, two major factors are involved: on one hand network pipes with large RTT /spl times/ bandwidth typically become data-starved, resulting in bandwidth loss; on the other hand when multiple nodes on the clusters attempt simultaneous transfer, the network pipe could become saturated, resulting in packet loss which again may result in bandwidth degradation in large RTT /spl times/ bandwidth networks. By dynamically and automatically adjusting transfer parameters between the two clusters, such as the number of network nodes, number of socket stripes, we could achieve optimal bandwidth even when the network is under heavy contention. In order to arrive at a proper performance model for automated adjustment, we have conducted several simulations by which we have discovered that such automatic tuning would beneficial, but the ideal number of network pipes does not exactly match the simple transfer model of traditional peer-to-peer settings between single nodes.
{"title":"Evaluation of the inter-cluster data transfer on Grid environment","authors":"Shoji Ogura, S. Matsuoka, H. Nakada","doi":"10.1109/CCGRID.2003.1199390","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199390","url":null,"abstract":"High-performance peer-to-peer transfer between clusters will be fundamental technology base for various Grid middleware, such as large-scale data transfer in DataGrid settings, or collective communication in Grid-wide MPIs. There, two major factors are involved: on one hand network pipes with large RTT /spl times/ bandwidth typically become data-starved, resulting in bandwidth loss; on the other hand when multiple nodes on the clusters attempt simultaneous transfer, the network pipe could become saturated, resulting in packet loss which again may result in bandwidth degradation in large RTT /spl times/ bandwidth networks. By dynamically and automatically adjusting transfer parameters between the two clusters, such as the number of network nodes, number of socket stripes, we could achieve optimal bandwidth even when the network is under heavy contention. In order to arrive at a proper performance model for automated adjustment, we have conducted several simulations by which we have discovered that such automatic tuning would beneficial, but the ideal number of network pipes does not exactly match the simple transfer model of traditional peer-to-peer settings between single nodes.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123644639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199374
Bennet Uk, M. Taufer, T. Stricker, G. Settanni, A. Cavalli, A. Caflisch
The steady increase of computing power at lower and lower cost enables molecular dynamics simulations to investigate the process of protein folding with an explicit treatment of water molecules. Such simulations are typically done with well known computational chemistry codes like CHARMM. Desktop grids such as the United Devices MetaProcessor are highly attractive platforms, since scavenging for unused machines on Intra- and Internet delivers compute power that is almost free. However, the predominant programming paradigm for current desktop grids is pure task parallelism and might not fit the needs for protein folding simulations with explicit water molecules. A short overall turn-around time of a simulation remains highly important for research productivity, but the need for an accurate model and long simulation time-scales leads to tasks that are too large for optimal scheduling on a desktop grid. To address this problem, we introduce a combination of task- and data parallelism as a well suitable computing paradigm for protein folding investigations on grid platforms. As a proof of concept, we design and implement a simple system for protein folding simulations based on the notion of combined task and data parallelism with clustered workers. Clustered workers are machines grouped into small clusters according to network and CPU performance criteria and act as super-nodes within a desktop grid, permitting the utilization of data parallelism in addition to the task parallelism. We integrate our new paradigm into the existing software environment of the United Devices MetaProcessor. For a test protein, we reach a better quality of the folding calculations than we reached using just task parallelism on distributed systems.
{"title":"Combining task- and data parallelism to speed up protein folding on a desktop grid platform","authors":"Bennet Uk, M. Taufer, T. Stricker, G. Settanni, A. Cavalli, A. Caflisch","doi":"10.1109/CCGRID.2003.1199374","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199374","url":null,"abstract":"The steady increase of computing power at lower and lower cost enables molecular dynamics simulations to investigate the process of protein folding with an explicit treatment of water molecules. Such simulations are typically done with well known computational chemistry codes like CHARMM. Desktop grids such as the United Devices MetaProcessor are highly attractive platforms, since scavenging for unused machines on Intra- and Internet delivers compute power that is almost free. However, the predominant programming paradigm for current desktop grids is pure task parallelism and might not fit the needs for protein folding simulations with explicit water molecules. A short overall turn-around time of a simulation remains highly important for research productivity, but the need for an accurate model and long simulation time-scales leads to tasks that are too large for optimal scheduling on a desktop grid. To address this problem, we introduce a combination of task- and data parallelism as a well suitable computing paradigm for protein folding investigations on grid platforms. As a proof of concept, we design and implement a simple system for protein folding simulations based on the notion of combined task and data parallelism with clustered workers. Clustered workers are machines grouped into small clusters according to network and CPU performance criteria and act as super-nodes within a desktop grid, permitting the utilization of data parallelism in addition to the task parallelism. We integrate our new paradigm into the existing software environment of the United Devices MetaProcessor. For a test protein, we reach a better quality of the folding calculations than we reached using just task parallelism on distributed systems.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129852937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199397
V. Sunderam, James S. Pascoe, R. Loader
We propose the notion of 'collaborative peer groups', defined as peer-to-peer overlay networks with controlled membership and multiway communication primitives that offer well-defined semantics. Peers join such groups subject to symmetric acceptance, typically based on functional commonalities and, optionally, group-specific authentication. Collaborative peer group networks share the same properties as other peer-to-peer networks, including full decentralization, symmetric abilities, and dynamism. In addition, however, an extensible set of multiway communication primitives, especially appropriate for such peer groups, is provided and supports operations such as reliable message delivery to proximal group members or a subset thereof, message aggregation from peers, and discovery of peers supporting specific functional attributes. Based on several current and emerging application scenarios, we motivate and present the proposed collaborative peer group model, outline the group management architecture, and describe the initial set of communication primitives to be supported. A discussion of the toolkit development methodology and preliminary experiences concludes the paper.
{"title":"Towards a framework for collaborative peer groups","authors":"V. Sunderam, James S. Pascoe, R. Loader","doi":"10.1109/CCGRID.2003.1199397","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199397","url":null,"abstract":"We propose the notion of 'collaborative peer groups', defined as peer-to-peer overlay networks with controlled membership and multiway communication primitives that offer well-defined semantics. Peers join such groups subject to symmetric acceptance, typically based on functional commonalities and, optionally, group-specific authentication. Collaborative peer group networks share the same properties as other peer-to-peer networks, including full decentralization, symmetric abilities, and dynamism. In addition, however, an extensible set of multiway communication primitives, especially appropriate for such peer groups, is provided and supports operations such as reliable message delivery to proximal group members or a subset thereof, message aggregation from peers, and discovery of peers supporting specific functional attributes. Based on several current and emerging application scenarios, we motivate and present the proposed collaborative peer group model, outline the group management architecture, and describe the initial set of communication primitives to be supported. A discussion of the toolkit development methodology and preliminary experiences concludes the paper.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122140486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199352
L. Cherkasova, Loren Staley
Utility Data Center (UDC) provides a flexible, cost-effective infrastructure to support the hosting of applications for Internet services. In order to enable the design of a "utility-aware" streaming media service which automatically requests the necessary resources from UDC infrastructure, we introduce a set of benchmarks for measuring the basic capacities of streaming media systems. The benchmarks allow one to derive the scaling rules of server capacity for delivering media files which are: i) encoded at different bit rates, ii) streamed from memory vs disk. Using an experimental testbed, we show that these scaling rules are non-trivial. In this paper, we develop a workload-aware, media server performance model which is based on a cost function derived from the set of basic benchmark measurements. We validate this performance model by comparing the predicted and measured media server capacities for a set of synthetic workloads.
公用事业数据中心(Utility Data Center, UDC)提供了一种灵活的、经济高效的基础设施,以支持Internet服务的应用程序托管。为了实现“实用感知”流媒体服务的设计,该服务能够自动从UDC基础设施请求必要的资源,我们引入了一组基准来测量流媒体系统的基本容量。这些基准测试允许我们推导出传输媒体文件的服务器容量的缩放规则:i)以不同的比特率编码,ii)从内存和磁盘传输。通过一个实验平台,我们证明了这些缩放规则是非平凡的。在本文中,我们开发了一个工作负载感知的媒体服务器性能模型,该模型基于从基本基准测量集派生的成本函数。我们通过比较一组合成工作负载的预测和测量的媒体服务器容量来验证此性能模型。
{"title":"Building a performance model of streaming media applications in utility data center environment","authors":"L. Cherkasova, Loren Staley","doi":"10.1109/CCGRID.2003.1199352","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199352","url":null,"abstract":"Utility Data Center (UDC) provides a flexible, cost-effective infrastructure to support the hosting of applications for Internet services. In order to enable the design of a \"utility-aware\" streaming media service which automatically requests the necessary resources from UDC infrastructure, we introduce a set of benchmarks for measuring the basic capacities of streaming media systems. The benchmarks allow one to derive the scaling rules of server capacity for delivering media files which are: i) encoded at different bit rates, ii) streamed from memory vs disk. Using an experimental testbed, we show that these scaling rules are non-trivial. In this paper, we develop a workload-aware, media server performance model which is based on a cost function derived from the set of basic benchmark measurements. We validate this performance model by comparing the predicted and measured media server capacities for a set of synthetic workloads.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"307 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121262795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199394
Samir Djilali
This paper presents design and implementation of a remote Procedure call (RPC) API for programming applications on Peer-to-Peer environments. The P2P-RPC API is designed to address one of neglected aspect of Peer-to-Peer the lack of a simple programming interface. In this paper we examine one concrete implementation of the P2P-RPC-API derived from OmniRPC (an existing RPC API for the Grid based on Ninf system). This new API is implemented on top of low-level functionalities of the XtremWeb Peer-to-Peer Computing System. The minimal API defined in this paper provides a basic mechanism to make migrate a wide variety of applications using RPC mechanism to the Peer-to-Peer systems. We evaluate P2P-RPC for a numerical application (NAS EP Benchmark) and demonstrate its performance and fault tolerance properties.
{"title":"P2P-RPC: programming scientific applications on peer-to-peer systems with remote procedure call","authors":"Samir Djilali","doi":"10.1109/CCGRID.2003.1199394","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199394","url":null,"abstract":"This paper presents design and implementation of a remote Procedure call (RPC) API for programming applications on Peer-to-Peer environments. The P2P-RPC API is designed to address one of neglected aspect of Peer-to-Peer the lack of a simple programming interface. In this paper we examine one concrete implementation of the P2P-RPC-API derived from OmniRPC (an existing RPC API for the Grid based on Ninf system). This new API is implemented on top of low-level functionalities of the XtremWeb Peer-to-Peer Computing System. The minimal API defined in this paper provides a basic mechanism to make migrate a wide variety of applications using RPC mechanism to the Peer-to-Peer systems. We evaluate P2P-RPC for a numerical application (NAS EP Benchmark) and demonstrate its performance and fault tolerance properties.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121364145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199399
Yaohang Li, M. Mascagni
High performance computing on a large-scale computational grid is complicated by the heterogeneous computational capabilities of each node, node unavailability, and unreliable network connectivity. Replicating computation on multiple nodes can significantly improve performance by reducing task completion time on a grid's dynamic environment. We develop an analytical model to determine the number of task replicas to meet the performance goals in different computational grid configurations. Furthermore, taking advantage of the statistical nature of grid-based Monte Carlo applications, we extend the computational replication technique to an N-out-of-M scheduling strategy for grid-based Monte Carlo applications, which can potentially form a large category of grid-computing applications. In addition, we establish a corresponding model for the N-out-of-M scheduling mechanism. Simulations are used to validate the computational replication models. Our preliminary results show that the models we use are effective in predicting the required number of replicas to achieve short task completion time with a given high probability.
{"title":"Improving performance via computational replication on a large-scale computational grid","authors":"Yaohang Li, M. Mascagni","doi":"10.1109/CCGRID.2003.1199399","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199399","url":null,"abstract":"High performance computing on a large-scale computational grid is complicated by the heterogeneous computational capabilities of each node, node unavailability, and unreliable network connectivity. Replicating computation on multiple nodes can significantly improve performance by reducing task completion time on a grid's dynamic environment. We develop an analytical model to determine the number of task replicas to meet the performance goals in different computational grid configurations. Furthermore, taking advantage of the statistical nature of grid-based Monte Carlo applications, we extend the computational replication technique to an N-out-of-M scheduling strategy for grid-based Monte Carlo applications, which can potentially form a large category of grid-computing applications. In addition, we establish a corresponding model for the N-out-of-M scheduling mechanism. Simulations are used to validate the computational replication models. Our preliminary results show that the models we use are effective in predicting the required number of replicas to achieve short task completion time with a given high probability.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"317 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116363978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199395
Tyron Stading
Distributed systems require the ability to communicate securely with other computers in the network. To accomplish this, most systems use key management schemes that require prior knowledge of public keys associated with critical nodes. In large, dynamic, anonymous systems, this key sharing method is not viable. Scribe is a method for efficient key management inside a distributed system that uses identity based encryption (IBE). Public resources in a network are addressable by unique identifiers. Using this identifier as a public key, other entities are able to securely access that resource. We evaluate key distribution schemes inside Scribe and provide recommendations for practical implementation to allow for secure, efficient, authenticated communication inside a distributed system.
{"title":"Secure communication in a distributed system using identity based encryption","authors":"Tyron Stading","doi":"10.1109/CCGRID.2003.1199395","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199395","url":null,"abstract":"Distributed systems require the ability to communicate securely with other computers in the network. To accomplish this, most systems use key management schemes that require prior knowledge of public keys associated with critical nodes. In large, dynamic, anonymous systems, this key sharing method is not viable. Scribe is a method for efficient key management inside a distributed system that uses identity based encryption (IBE). Public resources in a network are addressable by unique identifiers. Using this identifier as a public key, other entities are able to securely access that resource. We evaluate key distribution schemes inside Scribe and provide recommendations for practical implementation to allow for secure, efficient, authenticated communication inside a distributed system.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125828914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199434
B. Overeinder, F. Brazier, O. Marin
Open multi-agent systems need to cope with the characteristics of the Internet, e.g., dynamic availability of computational resources, latency, and diversity of services. Large-scale multi-agent systems employed on wide-area distributed systems are susceptible to both hardware and software failures. This paper describes AgentScape, a multi-agent system support environment, DARX, a framework for providing fault tolerance in large scale agent systems, and a design for the integration of the two.
{"title":"Fault tolerance in scalable agent support systems: integrating DARX in the AgentScape framework","authors":"B. Overeinder, F. Brazier, O. Marin","doi":"10.1109/CCGRID.2003.1199434","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199434","url":null,"abstract":"Open multi-agent systems need to cope with the characteristics of the Internet, e.g., dynamic availability of computational resources, latency, and diversity of services. Large-scale multi-agent systems employed on wide-area distributed systems are susceptible to both hardware and software failures. This paper describes AgentScape, a multi-agent system support environment, DARX, a framework for providing fault tolerance in large scale agent systems, and a design for the integration of the two.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127739726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}