首页 > 最新文献

CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.最新文献

英文 中文
Agile computing: bridging the gap between grid computing and ad-hoc peer-to-peer resource sharing 敏捷计算:弥合网格计算和点对点资源共享之间的差距
Niranjan Suri, J. Bradshaw, Marco M. Carvalho, Thomas B. Cowin, M. Breedy, Paul T. Groth, Raul Saavedra
Agile computing may be defined as opportunistically (or on user demand) discovering and taking advantage of available resources in order to improve capability, performance, efficiency, fault tolerance, and survivability. The term agile is used to highlight both the need to quickly react to changes in the environment as well as the need to exploit transient resources only available for short periods of time. Agile computing builds on current research in grid computing, ad-hoc networking, and peer-to-peer resource sharing. This paper describes both the general notion of agile computing as well as one particular approach that exploits mobility of code, data, and computation. Some performance metrics are also suggested to measure the effectiveness of any approach to agile computing.
敏捷计算可以定义为偶然地(或根据用户需求)发现和利用可用资源,以提高能力、性能、效率、容错能力和生存能力。敏捷这个术语被用来强调快速响应环境变化的需求,以及利用只能在短时间内可用的瞬时资源的需求。敏捷计算建立在网格计算、自组织网络和点对点资源共享方面的当前研究基础之上。本文既介绍了敏捷计算的一般概念,也介绍了利用代码、数据和计算的移动性的一种特殊方法。还建议使用一些性能指标来衡量敏捷计算方法的有效性。
{"title":"Agile computing: bridging the gap between grid computing and ad-hoc peer-to-peer resource sharing","authors":"Niranjan Suri, J. Bradshaw, Marco M. Carvalho, Thomas B. Cowin, M. Breedy, Paul T. Groth, Raul Saavedra","doi":"10.1109/CCGRID.2003.1199423","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199423","url":null,"abstract":"Agile computing may be defined as opportunistically (or on user demand) discovering and taking advantage of available resources in order to improve capability, performance, efficiency, fault tolerance, and survivability. The term agile is used to highlight both the need to quickly react to changes in the environment as well as the need to exploit transient resources only available for short periods of time. Agile computing builds on current research in grid computing, ad-hoc networking, and peer-to-peer resource sharing. This paper describes both the general notion of agile computing as well as one particular approach that exploits mobility of code, data, and computation. Some performance metrics are also suggested to measure the effectiveness of any approach to agile computing.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127405792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Criticality-based analysis and design of unstructured peer-to-peer networks as "Complex systems" 基于临界性的非结构化点对点网络“复杂系统”分析与设计
F. Kashani, C. Shahabi
Due to enormous complexity of the unstructured peer-to-peer networks as large-scale, self-configure, and dynamic systems, the models used to characterize these systems are either inaccurate, because of oversimplification, or analytically inapplicable, due to their high complexity. By recognizing unstructured peer-to-peer networks as "complex systems ", we employ statistical models used before to characterize complex systems for formal analysis and efficient design of peer-to-peer networks. We provide two examples of application of this modeling approach that demonstrate its power. For instance, using this approach we have been able to formalize the main problem with normal flooding search, propose a remedial approach with our probabilistic flooding technique, and find the optimal operating point for probabilistic flooding rigorously, such that it improves scalability of the normal flooding by 99%.
由于非结构化点对点网络作为大规模、自配置和动态系统的巨大复杂性,用于描述这些系统的模型要么由于过度简化而不准确,要么由于其高复杂性而在分析上不适用。通过将非结构化点对点网络视为“复杂系统”,我们采用之前使用的统计模型来表征复杂系统,以便对点对点网络进行形式化分析和有效设计。我们提供了该建模方法的两个应用示例,以展示其功能。例如,使用这种方法,我们已经能够形式化常规驱油搜索的主要问题,提出了一种使用我们的概率驱油技术的补救方法,并严格找到了概率驱油的最佳工作点,从而将常规驱油的可扩展性提高了99%。
{"title":"Criticality-based analysis and design of unstructured peer-to-peer networks as \"Complex systems\"","authors":"F. Kashani, C. Shahabi","doi":"10.1109/CCGRID.2003.1199387","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199387","url":null,"abstract":"Due to enormous complexity of the unstructured peer-to-peer networks as large-scale, self-configure, and dynamic systems, the models used to characterize these systems are either inaccurate, because of oversimplification, or analytically inapplicable, due to their high complexity. By recognizing unstructured peer-to-peer networks as \"complex systems \", we employ statistical models used before to characterize complex systems for formal analysis and efficient design of peer-to-peer networks. We provide two examples of application of this modeling approach that demonstrate its power. For instance, using this approach we have been able to formalize the main problem with normal flooding search, propose a remedial approach with our probabilistic flooding technique, and find the optimal operating point for probabilistic flooding rigorously, such that it improves scalability of the normal flooding by 99%.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115938006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 81
Supporting peer-to-peer computing with FlexiNet 支持使用FlexiNet进行点对点计算
T. Fuhrmann
Formation of suitable overlay-network topologies that are able to reflect the structure of the underlying network-infrastructure, has rarely been addressed by peer-to-peer applications so far. Often, peer-to-peer protocols restrain to purely random formation of their overlay-network. This leads to a far from optimal performance of such peer-to-peer networks and ruthlessly wastes network resources. In this paper, we describe a simple mechanism that uses programmable network technologies to improve the topology formation process of unstructured peer-to-peer networks. Being a network service, our mechanism does not require any modification of existing applications or computing systems. By that, it assists network operators with improving the performance of their network and relieves programmers from the burden of designing and implementing topology-aware peer-to-peer protocols. Although we use the well-know Gnutella protocol to describe the mechanism of our proposed service, it applies to all kinds of unstructured global peer-to-peer computing applications.
能够反映底层网络基础设施结构的合适的覆盖网络拓扑的形成,迄今为止很少被对等应用程序解决。通常,点对点协议限制其覆盖网络的完全随机形成。这导致这种点对点网络的性能远远不能达到最优,并且无情地浪费了网络资源。在本文中,我们描述了一种使用可编程网络技术来改进非结构化点对点网络拓扑形成过程的简单机制。作为一种网络服务,我们的机制不需要对现有应用程序或计算系统进行任何修改。通过这种方式,它可以帮助网络运营商提高其网络性能,并减轻程序员设计和实现拓扑感知对等协议的负担。尽管我们使用众所周知的Gnutella协议来描述我们提议的服务的机制,但它适用于各种非结构化的全球点对点计算应用程序。
{"title":"Supporting peer-to-peer computing with FlexiNet","authors":"T. Fuhrmann","doi":"10.1109/CCGRID.2003.1199392","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199392","url":null,"abstract":"Formation of suitable overlay-network topologies that are able to reflect the structure of the underlying network-infrastructure, has rarely been addressed by peer-to-peer applications so far. Often, peer-to-peer protocols restrain to purely random formation of their overlay-network. This leads to a far from optimal performance of such peer-to-peer networks and ruthlessly wastes network resources. In this paper, we describe a simple mechanism that uses programmable network technologies to improve the topology formation process of unstructured peer-to-peer networks. Being a network service, our mechanism does not require any modification of existing applications or computing systems. By that, it assists network operators with improving the performance of their network and relieves programmers from the burden of designing and implementing topology-aware peer-to-peer protocols. Although we use the well-know Gnutella protocol to describe the mechanism of our proposed service, it applies to all kinds of unstructured global peer-to-peer computing applications.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114929045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging non-uniform resources for parallel query processing 利用非统一资源进行并行查询处理
Tobias Mayr, Philippe Bonnet, J. Gehrke, P. Seshadri
Modular clusters are now composed of nonuniform nodes with different CPUs, disks or network cards so that customers can adapt the cluster configuration to the changing technologies and to their changing needs. This challenges dataflow parallelism as the primary load balancing technique of existing parallel database systems. We show in this paper that dataflow parallelism alone is ill suited for modular clusters because running the same operation on different subsets of the data can not fully utilize non-uniform hardware resources. We propose and evaluate new load balancing techniques that blend pipeline parallelism with data parallelism. We consider relational operators as pipelines of fine-grained operations that can be located on different cluster nodes and executed in parallel on different data subsets to best exploit non-uniform resources. We present an experimental study that confirms the feasibility and effectiveness of the new techniques in a parallel execution engine prototype based on the open-source DBMS Predator.
模块化集群现在由具有不同cpu、磁盘或网卡的非统一节点组成,因此客户可以根据不断变化的技术和需求调整集群配置。这对作为现有并行数据库系统的主要负载平衡技术的数据流并行性提出了挑战。我们在本文中表明,数据流并行性本身不适合模块化集群,因为在不同的数据子集上运行相同的操作不能充分利用非统一的硬件资源。我们提出并评估了混合管道并行和数据并行的新的负载平衡技术。我们将关系操作符视为细粒度操作的管道,这些操作可以位于不同的集群节点上,并在不同的数据子集上并行执行,以最好地利用非统一的资源。我们提出了一项实验研究,证实了新技术在基于开源DBMS Predator的并行执行引擎原型中的可行性和有效性。
{"title":"Leveraging non-uniform resources for parallel query processing","authors":"Tobias Mayr, Philippe Bonnet, J. Gehrke, P. Seshadri","doi":"10.1109/CCGRID.2003.1199360","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199360","url":null,"abstract":"Modular clusters are now composed of nonuniform nodes with different CPUs, disks or network cards so that customers can adapt the cluster configuration to the changing technologies and to their changing needs. This challenges dataflow parallelism as the primary load balancing technique of existing parallel database systems. We show in this paper that dataflow parallelism alone is ill suited for modular clusters because running the same operation on different subsets of the data can not fully utilize non-uniform hardware resources. We propose and evaluate new load balancing techniques that blend pipeline parallelism with data parallelism. We consider relational operators as pipelines of fine-grained operations that can be located on different cluster nodes and executed in parallel on different data subsets to best exploit non-uniform resources. We present an experimental study that confirms the feasibility and effectiveness of the new techniques in a parallel execution engine prototype based on the open-source DBMS Predator.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123835169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Fair share on high performance computing systems: what does fair really mean? 高性能计算系统的公平分配:公平的真正含义是什么?
S. Kleban, S. Clearwater
We report on a performance evaluation of a Fair Share system at the ASCI Blue Mountain supercomputer cluster. We study the impacts of share allocation under Fair Share on wait times and expansion factor. We also measure the Service Ratio, a typical figure of merit for Fair Share systems, with respect to a number of job parameters. We conclude that Fair Share does little to alter important performance metrics such as expansion factor. This leads to the question of what Fair Share means on cluster machines. The essential difference between Fair Share on a uni-processor and a cluster is that the workload on a cluster is not fungible in space or time. We find that cluster machines must be highly utilized and support checkpointing in order for Fair Share to function more closely to the spirit in which it was originally developed.
我们报告了在ASCI蓝山超级计算机集群上公平共享系统的性能评估。我们研究了公平份额下的股份分配对等待时间和扩张系数的影响。我们还测量了服务比率,这是公平分享系统的一个典型的优点数字,相对于一些工作参数。我们得出的结论是,公平份额对改变重要的性能指标(如扩展系数)几乎没有影响。这就引出了公平共享在集群机器上意味着什么的问题。单处理器上的公平共享与集群上的公平共享的本质区别在于,集群上的工作负载在空间或时间上是不可替代的。我们发现集群机器必须高度利用并支持检查点,以便公平共享更接近其最初开发的精神。
{"title":"Fair share on high performance computing systems: what does fair really mean?","authors":"S. Kleban, S. Clearwater","doi":"10.1109/CCGRID.2003.1199363","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199363","url":null,"abstract":"We report on a performance evaluation of a Fair Share system at the ASCI Blue Mountain supercomputer cluster. We study the impacts of share allocation under Fair Share on wait times and expansion factor. We also measure the Service Ratio, a typical figure of merit for Fair Share systems, with respect to a number of job parameters. We conclude that Fair Share does little to alter important performance metrics such as expansion factor. This leads to the question of what Fair Share means on cluster machines. The essential difference between Fair Share on a uni-processor and a cluster is that the workload on a cluster is not fungible in space or time. We find that cluster machines must be highly utilized and support checkpointing in order for Fair Share to function more closely to the spirit in which it was originally developed.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123207291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
MAGNET: a tool for debugging, analyzing and adapting computing systems MAGNET:用于调试、分析和调整计算系统的工具
M. Gardner, Wu-chun Feng, M. Broxton, Adam Engelhart, J. Hurwitz
As computing systems grow in complexity, the cluster and grid communities require more sophisticated tools to diagnose, debug and analyze such systems. We have developed a toolkit called MAGNET (Monitoring Apparatus for General kerNel-Event Tracing) that provides a detailed look at operating-system kernel events with very low overhead. Using the fine-grained information that MAGNET exports from kernel space, challenging problems become amenable to identification and correction. In this paper, we first present the design, implementation and evaluation of MAGNET. Then, we show its use as a diagnostic tool, an online-monitoring tool and a tool for building adaptive applications in clusters and grids.
随着计算系统变得越来越复杂,集群和网格社区需要更复杂的工具来诊断、调试和分析这些系统。我们开发了一个名为MAGNET(用于通用内核事件跟踪的监视设备)的工具包,它以非常低的开销提供了对操作系统内核事件的详细查看。使用MAGNET从内核空间导出的细粒度信息,具有挑战性的问题变得易于识别和纠正。在本文中,我们首先介绍了MAGNET的设计、实现和评估。然后,我们展示了它作为诊断工具、在线监控工具以及在集群和网格中构建自适应应用程序的工具的用途。
{"title":"MAGNET: a tool for debugging, analyzing and adapting computing systems","authors":"M. Gardner, Wu-chun Feng, M. Broxton, Adam Engelhart, J. Hurwitz","doi":"10.1109/CCGRID.2003.1199382","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199382","url":null,"abstract":"As computing systems grow in complexity, the cluster and grid communities require more sophisticated tools to diagnose, debug and analyze such systems. We have developed a toolkit called MAGNET (Monitoring Apparatus for General kerNel-Event Tracing) that provides a detailed look at operating-system kernel events with very low overhead. Using the fine-grained information that MAGNET exports from kernel space, challenging problems become amenable to identification and correction. In this paper, we first present the design, implementation and evaluation of MAGNET. Then, we show its use as a diagnostic tool, an online-monitoring tool and a tool for building adaptive applications in clusters and grids.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129892696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Performance guarantees for cluster-based internet services 基于集群的互联网服务的性能保证
Chang Li, Gang Peng, Kartik Gopalan, T. Chiueh
As web-based transactions become an essential element of everyday corporate and commerce activities, it becomes increasingly important that the performance of web-based services be predictable and guaranteed even in the presence of wildly fluctuating input loads. In this paper, we propose a general implementation framework to provide quality of service (QoS) guarantee for cluster-based Internet services, such as E-commerce or directory service. We describe the design, implementation, and evaluation of a web request distribution system called Gage, which can provide every subscriber with distinct guarantee on the number of generic web requests that are serviced per second regardless of the total input loads at run time. Gage is one of the first systems that can support QoS guarantee involving multiple system resources, i.e., CPU, disk, and network. The frontend request distribution server of Gage distributes incoming requests among a cluster of back-end web server nodes so as to maintain per-subscriber QoS guarantee and load balance among the back-end servers. Each back-end web server node includes a Gage module, which performs distributed TCP splicing and detailed resource usage accounting. Performance evaluation of the fully operational Gage prototype demonstrates that the proposed architecture can indeed provide the guaranteed request throughput for different classes of web accesses, even in the presence of excessive input loads. The additional performance overhead associated with QoS support in Gage is merely 3.06%.
随着基于web的交易成为日常公司和商业活动的基本元素,即使在输入负载剧烈波动的情况下,基于web的服务的性能也是可预测和保证的,这一点变得越来越重要。本文提出了一个通用的实现框架,为基于集群的互联网服务(如电子商务或目录服务)提供服务质量(QoS)保证。我们描述了一个名为Gage的web请求分发系统的设计、实现和评估,该系统可以为每个订阅者提供不同的保证,保证每秒处理的通用web请求的数量,而不管运行时的总输入负载。Gage是最早支持涉及多个系统资源(即CPU、磁盘和网络)的QoS保证的系统之一。Gage的前端请求分发服务器将传入的请求分发到后端web服务器节点集群中,以保持每个用户的QoS保证和后端服务器之间的负载均衡。每个后端web服务器节点都包含一个Gage模块,该模块执行分布式TCP拼接和详细的资源使用统计。对完全运行的Gage原型的性能评估表明,即使在存在过多输入负载的情况下,所提出的架构确实可以为不同类型的web访问提供有保证的请求吞吐量。在Gage中,与QoS支持相关的额外性能开销仅为3.06%。
{"title":"Performance guarantees for cluster-based internet services","authors":"Chang Li, Gang Peng, Kartik Gopalan, T. Chiueh","doi":"10.1109/CCGRID.2003.1199378","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199378","url":null,"abstract":"As web-based transactions become an essential element of everyday corporate and commerce activities, it becomes increasingly important that the performance of web-based services be predictable and guaranteed even in the presence of wildly fluctuating input loads. In this paper, we propose a general implementation framework to provide quality of service (QoS) guarantee for cluster-based Internet services, such as E-commerce or directory service. We describe the design, implementation, and evaluation of a web request distribution system called Gage, which can provide every subscriber with distinct guarantee on the number of generic web requests that are serviced per second regardless of the total input loads at run time. Gage is one of the first systems that can support QoS guarantee involving multiple system resources, i.e., CPU, disk, and network. The frontend request distribution server of Gage distributes incoming requests among a cluster of back-end web server nodes so as to maintain per-subscriber QoS guarantee and load balance among the back-end servers. Each back-end web server node includes a Gage module, which performs distributed TCP splicing and detailed resource usage accounting. Performance evaluation of the fully operational Gage prototype demonstrates that the proposed architecture can indeed provide the guaranteed request throughput for different classes of web accesses, even in the presence of excessive input loads. The additional performance overhead associated with QoS support in Gage is merely 3.06%.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127691289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An extended home-based coherence protocol for causally consistent replicated read-write objects 一个扩展的基于家庭的一致性协议,用于因果一致的复制读写对象
J. Brzeziński, Michal Szychowiak
This paper considers the reliability of software Distributed Shared Memory systems where the unit of sharing is a persistent read-write object. We present art extended coherence protocol for causal consistency model, which integrates replication management with independent checkpointing. It uses a trove! coordinated burst checkpoint operation in order to replicate consistent checkpoints of shared objects in local memory of distinct system nodes. No special reliable hardware devices are required. The protocol offers high availability of shared objects with limited overhead and ensures fast recovery in case of multiple node failures. lit case of the network partitioning all the processes in a majority partition of the system can continuously access all the objects.
本文研究了以持久读写对象为共享单元的软件分布式共享内存系统的可靠性问题。我们提出了一种扩展的因果一致性协议,它集成了复制管理和独立检查点。它使用了一个宝库!协调突发检查点操作,以便在不同系统节点的局部内存中复制一致的共享对象检查点。不需要特殊可靠的硬件设备。该协议以有限的开销提供了共享对象的高可用性,并确保在多节点故障的情况下快速恢复。在网络分区的情况下,系统的多数分区中的所有进程都可以连续访问所有对象。
{"title":"An extended home-based coherence protocol for causally consistent replicated read-write objects","authors":"J. Brzeziński, Michal Szychowiak","doi":"10.1109/CCGRID.2003.1199408","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199408","url":null,"abstract":"This paper considers the reliability of software Distributed Shared Memory systems where the unit of sharing is a persistent read-write object. We present art extended coherence protocol for causal consistency model, which integrates replication management with independent checkpointing. It uses a trove! coordinated burst checkpoint operation in order to replicate consistent checkpoints of shared objects in local memory of distinct system nodes. No special reliable hardware devices are required. The protocol offers high availability of shared objects with limited overhead and ensures fast recovery in case of multiple node failures. lit case of the network partitioning all the processes in a majority partition of the system can continuously access all the objects.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127817725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OmniRPC: a grid RPC system for parallel programming in cluster and grid environment OmniRPC:用于集群和网格环境下并行编程的网格RPC系统
M. Sato, T. Boku, D. Takahashi
We have designed and implemented a Grid RPC system called OmniRPC, for parallel programming in cluster and grid environments. While OmniRPC inherits its API from Ninf, the programmer can use OpenMP for easy-to-use parallel programming because the API is designed to be thread-safe. To support typical master-worker grid applications such as a parametric execution, OmniRPC provides an automatic-initializable remote module to send and store data to a remote executable invoked in the remote host. Since it may accept several requests for subsequent calls by keeping the connection alive, the data set by the initialization is re-used, resulting in efficient execution by reducing the amount of communication. The OmniRPC system also supports a local environment with "rsh", a grid environment with Globus, and remote hosts with "ssh". Furthermore, the user can use the same program over OmniRPC for both clusters and grids because a typical grid resource is regarded simply as a cluster of clusters distributed geographically. For a cluster over a private network, an agent process running the server host functions as a proxy to relay communications between the client and the remote executables by multiplexing the communications into one connection to the client. This feature allows a single client to use a thousand of remote computing hosts.
我们设计并实现了一个名为OmniRPC的网格RPC系统,用于在集群和网格环境中并行编程。虽然OmniRPC继承了Ninf的API,但程序员可以使用OpenMP进行易于使用的并行编程,因为该API被设计为线程安全的。为了支持典型的主工网格应用程序(如参数执行),OmniRPC提供了一个可自动初始化的远程模块,用于向远程主机中调用的远程可执行文件发送和存储数据。由于它可以通过保持连接活动来接受后续调用的多个请求,因此可以重用初始化的数据集,从而通过减少通信量来提高执行效率。OmniRPC系统还支持带有“rsh”的本地环境、带有Globus的网格环境和带有“ssh”的远程主机。此外,用户可以通过OmniRPC对集群和网格使用相同的程序,因为典型的网格资源被简单地视为地理上分布的集群的集群。对于私有网络上的集群,运行服务器主机的代理进程充当代理,通过将通信多路复用到到客户机的一个连接中来中继客户机和远程可执行文件之间的通信。该特性允许单个客户机使用上千个远程计算主机。
{"title":"OmniRPC: a grid RPC system for parallel programming in cluster and grid environment","authors":"M. Sato, T. Boku, D. Takahashi","doi":"10.1109/CCGRID.2003.1199370","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199370","url":null,"abstract":"We have designed and implemented a Grid RPC system called OmniRPC, for parallel programming in cluster and grid environments. While OmniRPC inherits its API from Ninf, the programmer can use OpenMP for easy-to-use parallel programming because the API is designed to be thread-safe. To support typical master-worker grid applications such as a parametric execution, OmniRPC provides an automatic-initializable remote module to send and store data to a remote executable invoked in the remote host. Since it may accept several requests for subsequent calls by keeping the connection alive, the data set by the initialization is re-used, resulting in efficient execution by reducing the amount of communication. The OmniRPC system also supports a local environment with \"rsh\", a grid environment with Globus, and remote hosts with \"ssh\". Furthermore, the user can use the same program over OmniRPC for both clusters and grids because a typical grid resource is regarded simply as a cluster of clusters distributed geographically. For a cluster over a private network, an agent process running the server host functions as a proxy to relay communications between the client and the remote executables by multiplexing the communications into one connection to the client. This feature allows a single client to use a thousand of remote computing hosts.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120954251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
A greedy I/O scheduling method in the storage system of clusters 一种集群存储系统的贪婪I/O调度方法
Xinrong Zhou, Tong Wei
As the size of cluster becomes larger, the process ability of a cluster increases rapidly. Users will exploit this increased power to run scientific, physical and multimedia applications. These kinds of data-intensive applications require high performance storage subsystem. Parallel storage system such as RAID is widely used in today's clusters. In this paper, we bring out a "greedy" I/O scheduling method that utilizes Scatter and Gather operations inside the PCI-SCSI adapter to combine as many I/O operations within the same disk as possible. In this way we reduce the numbers of I/O operations and improve the performance of the whole storage system. After analyzing RAID control strategy, we find out that I/O commands' combination may also bring up data movement in memory and this kind of movement will increase the system's overhead. The experiment results in our real time operating environment show that a better performance can be achieved. The longer the data length is, the better improvement we can get, in some case, we can even get over 40% enhancement.
随着集群规模的增大,集群的处理能力迅速提高。用户将利用这种增强的功能来运行科学、物理和多媒体应用程序。这些类型的数据密集型应用程序需要高性能存储子系统。并行存储系统如RAID在当今的集群中得到了广泛的应用。在本文中,我们提出了一种“贪婪”I/O调度方法,该方法利用PCI-SCSI适配器内部的Scatter和Gather操作,在同一个磁盘内组合尽可能多的I/O操作。这样可以减少I/O操作的次数,提高整个存储系统的性能。通过对RAID控制策略的分析,我们发现I/O命令的组合也会引起内存中的数据移动,这种移动会增加系统的开销。在我们的实时操作环境下的实验结果表明,该方法可以取得较好的性能。数据长度越长,我们能得到的改进就越好,在某些情况下,我们甚至能得到超过40%的改进。
{"title":"A greedy I/O scheduling method in the storage system of clusters","authors":"Xinrong Zhou, Tong Wei","doi":"10.1109/CCGRID.2003.1199437","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199437","url":null,"abstract":"As the size of cluster becomes larger, the process ability of a cluster increases rapidly. Users will exploit this increased power to run scientific, physical and multimedia applications. These kinds of data-intensive applications require high performance storage subsystem. Parallel storage system such as RAID is widely used in today's clusters. In this paper, we bring out a \"greedy\" I/O scheduling method that utilizes Scatter and Gather operations inside the PCI-SCSI adapter to combine as many I/O operations within the same disk as possible. In this way we reduce the numbers of I/O operations and improve the performance of the whole storage system. After analyzing RAID control strategy, we find out that I/O commands' combination may also bring up data movement in memory and this kind of movement will increase the system's overhead. The experiment results in our real time operating environment show that a better performance can be achieved. The longer the data length is, the better improvement we can get, in some case, we can even get over 40% enhancement.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126796624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1