Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199423
Niranjan Suri, J. Bradshaw, Marco M. Carvalho, Thomas B. Cowin, M. Breedy, Paul T. Groth, Raul Saavedra
Agile computing may be defined as opportunistically (or on user demand) discovering and taking advantage of available resources in order to improve capability, performance, efficiency, fault tolerance, and survivability. The term agile is used to highlight both the need to quickly react to changes in the environment as well as the need to exploit transient resources only available for short periods of time. Agile computing builds on current research in grid computing, ad-hoc networking, and peer-to-peer resource sharing. This paper describes both the general notion of agile computing as well as one particular approach that exploits mobility of code, data, and computation. Some performance metrics are also suggested to measure the effectiveness of any approach to agile computing.
{"title":"Agile computing: bridging the gap between grid computing and ad-hoc peer-to-peer resource sharing","authors":"Niranjan Suri, J. Bradshaw, Marco M. Carvalho, Thomas B. Cowin, M. Breedy, Paul T. Groth, Raul Saavedra","doi":"10.1109/CCGRID.2003.1199423","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199423","url":null,"abstract":"Agile computing may be defined as opportunistically (or on user demand) discovering and taking advantage of available resources in order to improve capability, performance, efficiency, fault tolerance, and survivability. The term agile is used to highlight both the need to quickly react to changes in the environment as well as the need to exploit transient resources only available for short periods of time. Agile computing builds on current research in grid computing, ad-hoc networking, and peer-to-peer resource sharing. This paper describes both the general notion of agile computing as well as one particular approach that exploits mobility of code, data, and computation. Some performance metrics are also suggested to measure the effectiveness of any approach to agile computing.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127405792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199387
F. Kashani, C. Shahabi
Due to enormous complexity of the unstructured peer-to-peer networks as large-scale, self-configure, and dynamic systems, the models used to characterize these systems are either inaccurate, because of oversimplification, or analytically inapplicable, due to their high complexity. By recognizing unstructured peer-to-peer networks as "complex systems ", we employ statistical models used before to characterize complex systems for formal analysis and efficient design of peer-to-peer networks. We provide two examples of application of this modeling approach that demonstrate its power. For instance, using this approach we have been able to formalize the main problem with normal flooding search, propose a remedial approach with our probabilistic flooding technique, and find the optimal operating point for probabilistic flooding rigorously, such that it improves scalability of the normal flooding by 99%.
{"title":"Criticality-based analysis and design of unstructured peer-to-peer networks as \"Complex systems\"","authors":"F. Kashani, C. Shahabi","doi":"10.1109/CCGRID.2003.1199387","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199387","url":null,"abstract":"Due to enormous complexity of the unstructured peer-to-peer networks as large-scale, self-configure, and dynamic systems, the models used to characterize these systems are either inaccurate, because of oversimplification, or analytically inapplicable, due to their high complexity. By recognizing unstructured peer-to-peer networks as \"complex systems \", we employ statistical models used before to characterize complex systems for formal analysis and efficient design of peer-to-peer networks. We provide two examples of application of this modeling approach that demonstrate its power. For instance, using this approach we have been able to formalize the main problem with normal flooding search, propose a remedial approach with our probabilistic flooding technique, and find the optimal operating point for probabilistic flooding rigorously, such that it improves scalability of the normal flooding by 99%.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115938006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199392
T. Fuhrmann
Formation of suitable overlay-network topologies that are able to reflect the structure of the underlying network-infrastructure, has rarely been addressed by peer-to-peer applications so far. Often, peer-to-peer protocols restrain to purely random formation of their overlay-network. This leads to a far from optimal performance of such peer-to-peer networks and ruthlessly wastes network resources. In this paper, we describe a simple mechanism that uses programmable network technologies to improve the topology formation process of unstructured peer-to-peer networks. Being a network service, our mechanism does not require any modification of existing applications or computing systems. By that, it assists network operators with improving the performance of their network and relieves programmers from the burden of designing and implementing topology-aware peer-to-peer protocols. Although we use the well-know Gnutella protocol to describe the mechanism of our proposed service, it applies to all kinds of unstructured global peer-to-peer computing applications.
{"title":"Supporting peer-to-peer computing with FlexiNet","authors":"T. Fuhrmann","doi":"10.1109/CCGRID.2003.1199392","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199392","url":null,"abstract":"Formation of suitable overlay-network topologies that are able to reflect the structure of the underlying network-infrastructure, has rarely been addressed by peer-to-peer applications so far. Often, peer-to-peer protocols restrain to purely random formation of their overlay-network. This leads to a far from optimal performance of such peer-to-peer networks and ruthlessly wastes network resources. In this paper, we describe a simple mechanism that uses programmable network technologies to improve the topology formation process of unstructured peer-to-peer networks. Being a network service, our mechanism does not require any modification of existing applications or computing systems. By that, it assists network operators with improving the performance of their network and relieves programmers from the burden of designing and implementing topology-aware peer-to-peer protocols. Although we use the well-know Gnutella protocol to describe the mechanism of our proposed service, it applies to all kinds of unstructured global peer-to-peer computing applications.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114929045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199360
Tobias Mayr, Philippe Bonnet, J. Gehrke, P. Seshadri
Modular clusters are now composed of nonuniform nodes with different CPUs, disks or network cards so that customers can adapt the cluster configuration to the changing technologies and to their changing needs. This challenges dataflow parallelism as the primary load balancing technique of existing parallel database systems. We show in this paper that dataflow parallelism alone is ill suited for modular clusters because running the same operation on different subsets of the data can not fully utilize non-uniform hardware resources. We propose and evaluate new load balancing techniques that blend pipeline parallelism with data parallelism. We consider relational operators as pipelines of fine-grained operations that can be located on different cluster nodes and executed in parallel on different data subsets to best exploit non-uniform resources. We present an experimental study that confirms the feasibility and effectiveness of the new techniques in a parallel execution engine prototype based on the open-source DBMS Predator.
{"title":"Leveraging non-uniform resources for parallel query processing","authors":"Tobias Mayr, Philippe Bonnet, J. Gehrke, P. Seshadri","doi":"10.1109/CCGRID.2003.1199360","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199360","url":null,"abstract":"Modular clusters are now composed of nonuniform nodes with different CPUs, disks or network cards so that customers can adapt the cluster configuration to the changing technologies and to their changing needs. This challenges dataflow parallelism as the primary load balancing technique of existing parallel database systems. We show in this paper that dataflow parallelism alone is ill suited for modular clusters because running the same operation on different subsets of the data can not fully utilize non-uniform hardware resources. We propose and evaluate new load balancing techniques that blend pipeline parallelism with data parallelism. We consider relational operators as pipelines of fine-grained operations that can be located on different cluster nodes and executed in parallel on different data subsets to best exploit non-uniform resources. We present an experimental study that confirms the feasibility and effectiveness of the new techniques in a parallel execution engine prototype based on the open-source DBMS Predator.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123835169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199363
S. Kleban, S. Clearwater
We report on a performance evaluation of a Fair Share system at the ASCI Blue Mountain supercomputer cluster. We study the impacts of share allocation under Fair Share on wait times and expansion factor. We also measure the Service Ratio, a typical figure of merit for Fair Share systems, with respect to a number of job parameters. We conclude that Fair Share does little to alter important performance metrics such as expansion factor. This leads to the question of what Fair Share means on cluster machines. The essential difference between Fair Share on a uni-processor and a cluster is that the workload on a cluster is not fungible in space or time. We find that cluster machines must be highly utilized and support checkpointing in order for Fair Share to function more closely to the spirit in which it was originally developed.
{"title":"Fair share on high performance computing systems: what does fair really mean?","authors":"S. Kleban, S. Clearwater","doi":"10.1109/CCGRID.2003.1199363","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199363","url":null,"abstract":"We report on a performance evaluation of a Fair Share system at the ASCI Blue Mountain supercomputer cluster. We study the impacts of share allocation under Fair Share on wait times and expansion factor. We also measure the Service Ratio, a typical figure of merit for Fair Share systems, with respect to a number of job parameters. We conclude that Fair Share does little to alter important performance metrics such as expansion factor. This leads to the question of what Fair Share means on cluster machines. The essential difference between Fair Share on a uni-processor and a cluster is that the workload on a cluster is not fungible in space or time. We find that cluster machines must be highly utilized and support checkpointing in order for Fair Share to function more closely to the spirit in which it was originally developed.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123207291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199382
M. Gardner, Wu-chun Feng, M. Broxton, Adam Engelhart, J. Hurwitz
As computing systems grow in complexity, the cluster and grid communities require more sophisticated tools to diagnose, debug and analyze such systems. We have developed a toolkit called MAGNET (Monitoring Apparatus for General kerNel-Event Tracing) that provides a detailed look at operating-system kernel events with very low overhead. Using the fine-grained information that MAGNET exports from kernel space, challenging problems become amenable to identification and correction. In this paper, we first present the design, implementation and evaluation of MAGNET. Then, we show its use as a diagnostic tool, an online-monitoring tool and a tool for building adaptive applications in clusters and grids.
{"title":"MAGNET: a tool for debugging, analyzing and adapting computing systems","authors":"M. Gardner, Wu-chun Feng, M. Broxton, Adam Engelhart, J. Hurwitz","doi":"10.1109/CCGRID.2003.1199382","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199382","url":null,"abstract":"As computing systems grow in complexity, the cluster and grid communities require more sophisticated tools to diagnose, debug and analyze such systems. We have developed a toolkit called MAGNET (Monitoring Apparatus for General kerNel-Event Tracing) that provides a detailed look at operating-system kernel events with very low overhead. Using the fine-grained information that MAGNET exports from kernel space, challenging problems become amenable to identification and correction. In this paper, we first present the design, implementation and evaluation of MAGNET. Then, we show its use as a diagnostic tool, an online-monitoring tool and a tool for building adaptive applications in clusters and grids.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129892696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199378
Chang Li, Gang Peng, Kartik Gopalan, T. Chiueh
As web-based transactions become an essential element of everyday corporate and commerce activities, it becomes increasingly important that the performance of web-based services be predictable and guaranteed even in the presence of wildly fluctuating input loads. In this paper, we propose a general implementation framework to provide quality of service (QoS) guarantee for cluster-based Internet services, such as E-commerce or directory service. We describe the design, implementation, and evaluation of a web request distribution system called Gage, which can provide every subscriber with distinct guarantee on the number of generic web requests that are serviced per second regardless of the total input loads at run time. Gage is one of the first systems that can support QoS guarantee involving multiple system resources, i.e., CPU, disk, and network. The frontend request distribution server of Gage distributes incoming requests among a cluster of back-end web server nodes so as to maintain per-subscriber QoS guarantee and load balance among the back-end servers. Each back-end web server node includes a Gage module, which performs distributed TCP splicing and detailed resource usage accounting. Performance evaluation of the fully operational Gage prototype demonstrates that the proposed architecture can indeed provide the guaranteed request throughput for different classes of web accesses, even in the presence of excessive input loads. The additional performance overhead associated with QoS support in Gage is merely 3.06%.
{"title":"Performance guarantees for cluster-based internet services","authors":"Chang Li, Gang Peng, Kartik Gopalan, T. Chiueh","doi":"10.1109/CCGRID.2003.1199378","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199378","url":null,"abstract":"As web-based transactions become an essential element of everyday corporate and commerce activities, it becomes increasingly important that the performance of web-based services be predictable and guaranteed even in the presence of wildly fluctuating input loads. In this paper, we propose a general implementation framework to provide quality of service (QoS) guarantee for cluster-based Internet services, such as E-commerce or directory service. We describe the design, implementation, and evaluation of a web request distribution system called Gage, which can provide every subscriber with distinct guarantee on the number of generic web requests that are serviced per second regardless of the total input loads at run time. Gage is one of the first systems that can support QoS guarantee involving multiple system resources, i.e., CPU, disk, and network. The frontend request distribution server of Gage distributes incoming requests among a cluster of back-end web server nodes so as to maintain per-subscriber QoS guarantee and load balance among the back-end servers. Each back-end web server node includes a Gage module, which performs distributed TCP splicing and detailed resource usage accounting. Performance evaluation of the fully operational Gage prototype demonstrates that the proposed architecture can indeed provide the guaranteed request throughput for different classes of web accesses, even in the presence of excessive input loads. The additional performance overhead associated with QoS support in Gage is merely 3.06%.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127691289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199408
J. Brzeziński, Michal Szychowiak
This paper considers the reliability of software Distributed Shared Memory systems where the unit of sharing is a persistent read-write object. We present art extended coherence protocol for causal consistency model, which integrates replication management with independent checkpointing. It uses a trove! coordinated burst checkpoint operation in order to replicate consistent checkpoints of shared objects in local memory of distinct system nodes. No special reliable hardware devices are required. The protocol offers high availability of shared objects with limited overhead and ensures fast recovery in case of multiple node failures. lit case of the network partitioning all the processes in a majority partition of the system can continuously access all the objects.
{"title":"An extended home-based coherence protocol for causally consistent replicated read-write objects","authors":"J. Brzeziński, Michal Szychowiak","doi":"10.1109/CCGRID.2003.1199408","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199408","url":null,"abstract":"This paper considers the reliability of software Distributed Shared Memory systems where the unit of sharing is a persistent read-write object. We present art extended coherence protocol for causal consistency model, which integrates replication management with independent checkpointing. It uses a trove! coordinated burst checkpoint operation in order to replicate consistent checkpoints of shared objects in local memory of distinct system nodes. No special reliable hardware devices are required. The protocol offers high availability of shared objects with limited overhead and ensures fast recovery in case of multiple node failures. lit case of the network partitioning all the processes in a majority partition of the system can continuously access all the objects.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127817725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199370
M. Sato, T. Boku, D. Takahashi
We have designed and implemented a Grid RPC system called OmniRPC, for parallel programming in cluster and grid environments. While OmniRPC inherits its API from Ninf, the programmer can use OpenMP for easy-to-use parallel programming because the API is designed to be thread-safe. To support typical master-worker grid applications such as a parametric execution, OmniRPC provides an automatic-initializable remote module to send and store data to a remote executable invoked in the remote host. Since it may accept several requests for subsequent calls by keeping the connection alive, the data set by the initialization is re-used, resulting in efficient execution by reducing the amount of communication. The OmniRPC system also supports a local environment with "rsh", a grid environment with Globus, and remote hosts with "ssh". Furthermore, the user can use the same program over OmniRPC for both clusters and grids because a typical grid resource is regarded simply as a cluster of clusters distributed geographically. For a cluster over a private network, an agent process running the server host functions as a proxy to relay communications between the client and the remote executables by multiplexing the communications into one connection to the client. This feature allows a single client to use a thousand of remote computing hosts.
{"title":"OmniRPC: a grid RPC system for parallel programming in cluster and grid environment","authors":"M. Sato, T. Boku, D. Takahashi","doi":"10.1109/CCGRID.2003.1199370","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199370","url":null,"abstract":"We have designed and implemented a Grid RPC system called OmniRPC, for parallel programming in cluster and grid environments. While OmniRPC inherits its API from Ninf, the programmer can use OpenMP for easy-to-use parallel programming because the API is designed to be thread-safe. To support typical master-worker grid applications such as a parametric execution, OmniRPC provides an automatic-initializable remote module to send and store data to a remote executable invoked in the remote host. Since it may accept several requests for subsequent calls by keeping the connection alive, the data set by the initialization is re-used, resulting in efficient execution by reducing the amount of communication. The OmniRPC system also supports a local environment with \"rsh\", a grid environment with Globus, and remote hosts with \"ssh\". Furthermore, the user can use the same program over OmniRPC for both clusters and grids because a typical grid resource is regarded simply as a cluster of clusters distributed geographically. For a cluster over a private network, an agent process running the server host functions as a proxy to relay communications between the client and the remote executables by multiplexing the communications into one connection to the client. This feature allows a single client to use a thousand of remote computing hosts.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120954251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CCGRID.2003.1199437
Xinrong Zhou, Tong Wei
As the size of cluster becomes larger, the process ability of a cluster increases rapidly. Users will exploit this increased power to run scientific, physical and multimedia applications. These kinds of data-intensive applications require high performance storage subsystem. Parallel storage system such as RAID is widely used in today's clusters. In this paper, we bring out a "greedy" I/O scheduling method that utilizes Scatter and Gather operations inside the PCI-SCSI adapter to combine as many I/O operations within the same disk as possible. In this way we reduce the numbers of I/O operations and improve the performance of the whole storage system. After analyzing RAID control strategy, we find out that I/O commands' combination may also bring up data movement in memory and this kind of movement will increase the system's overhead. The experiment results in our real time operating environment show that a better performance can be achieved. The longer the data length is, the better improvement we can get, in some case, we can even get over 40% enhancement.
{"title":"A greedy I/O scheduling method in the storage system of clusters","authors":"Xinrong Zhou, Tong Wei","doi":"10.1109/CCGRID.2003.1199437","DOIUrl":"https://doi.org/10.1109/CCGRID.2003.1199437","url":null,"abstract":"As the size of cluster becomes larger, the process ability of a cluster increases rapidly. Users will exploit this increased power to run scientific, physical and multimedia applications. These kinds of data-intensive applications require high performance storage subsystem. Parallel storage system such as RAID is widely used in today's clusters. In this paper, we bring out a \"greedy\" I/O scheduling method that utilizes Scatter and Gather operations inside the PCI-SCSI adapter to combine as many I/O operations within the same disk as possible. In this way we reduce the numbers of I/O operations and improve the performance of the whole storage system. After analyzing RAID control strategy, we find out that I/O commands' combination may also bring up data movement in memory and this kind of movement will increase the system's overhead. The experiment results in our real time operating environment show that a better performance can be achieved. The longer the data length is, the better improvement we can get, in some case, we can even get over 40% enhancement.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126796624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}