Fatiha Bouabache, T. Hérault, Sylvain Peyronnet, F. Cappello
In grid computing, many scientific and engineering applications require access to large amounts of distributed data. The size and number of these data collections has been growing rapidly in recent years. The costs of data transmission take a significant part of the global execution time. When communication streams flow concurrently on shared links, transport control protocols have issues allocating fair bandwidth to all the streams, and the network becomes sub-optimally used. One way to deal with this situation is to schedule the communications in a way that will induce an optimal use of the network. We focus on the case of large data transfers that can be completely described at the initialization time. In this case, a plan of data migration can be computed at initialization time, and then executed. However, this computation phase must take a small time when compared to the actual execution of the plan. We propose a best effort solution, to compute approximately, based on the uniform random sampling of possible schedules, a communication plan. We show the effectiveness of this approach both theoretically and by simulations.
{"title":"Planning Large Data Transfers in Institutional Grids","authors":"Fatiha Bouabache, T. Hérault, Sylvain Peyronnet, F. Cappello","doi":"10.1109/CCGRID.2010.68","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.68","url":null,"abstract":"In grid computing, many scientific and engineering applications require access to large amounts of distributed data. The size and number of these data collections has been growing rapidly in recent years. The costs of data transmission take a significant part of the global execution time. When communication streams flow concurrently on shared links, transport control protocols have issues allocating fair bandwidth to all the streams, and the network becomes sub-optimally used. One way to deal with this situation is to schedule the communications in a way that will induce an optimal use of the network. We focus on the case of large data transfers that can be completely described at the initialization time. In this case, a plan of data migration can be computed at initialization time, and then executed. However, this computation phase must take a small time when compared to the actual execution of the plan. We propose a best effort solution, to compute approximately, based on the uniform random sampling of possible schedules, a communication plan. We show the effectiveness of this approach both theoretically and by simulations.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125769830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We develop a novel framework for supporting e-Science applications that require streaming of information between sites. Using a Synchronous Dataflow (SDF) model, our framework incorporates the communication times inherent in large scale distributed applications, and can be used to formulate the bandwidth allocation problem with throughput constraints as a multi-commodity linear programming problem. Our algorithms determine how much bandwidth is allocated to each edge while satisfying temporal constraints on collaborative tasks. Simulation results show that the bandwidth allocation by the formulated linear programming outperforms the bandwidth allocation by simple heuristics.
{"title":"Bandwidth Allocation for Iterative Data-Dependent E-science Applications","authors":"Eun-Sung Jung, S. Ranka, S. Sahni","doi":"10.1109/CCGRID.2010.114","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.114","url":null,"abstract":"We develop a novel framework for supporting e-Science applications that require streaming of information between sites. Using a Synchronous Dataflow (SDF) model, our framework incorporates the communication times inherent in large scale distributed applications, and can be used to formulate the bandwidth allocation problem with throughput constraints as a multi-commodity linear programming problem. Our algorithms determine how much bandwidth is allocated to each edge while satisfying temporal constraints on collaborative tasks. Simulation results show that the bandwidth allocation by the formulated linear programming outperforms the bandwidth allocation by simple heuristics.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127872221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To maximize the performance of emerging multi- and many-core accelerators such as the IBM Cell B.E. and the NVIDIA GPU, a Memory Centric Kernel Framework (MCKF) was developed. MCKF allows a user to decompose the physical space of an application based on the available fast memory in the accelerators. In this way, reducing the communication cost in accessing data can maximize the extraordinary computing power of the accelerators. MCKF is both generic and flexible because it encapsulates hardware-specific characteristics. It has been implemented and tested for short-range inter-active particle simulation on IBM Cell B.E. blades.
{"title":"A Memory Centric Kernel Framework for Accelerating Short-Range, Interactive Particle Simulation","authors":"Ian Stewart, Shujia Zhou","doi":"10.1109/CCGRID.2010.108","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.108","url":null,"abstract":"To maximize the performance of emerging multi- and many-core accelerators such as the IBM Cell B.E. and the NVIDIA GPU, a Memory Centric Kernel Framework (MCKF) was developed. MCKF allows a user to decompose the physical space of an application based on the available fast memory in the accelerators. In this way, reducing the communication cost in accessing data can maximize the extraordinary computing power of the accelerators. MCKF is both generic and flexible because it encapsulates hardware-specific characteristics. It has been implemented and tested for short-range inter-active particle simulation on IBM Cell B.E. blades.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122660965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the strong increase of processing units available to the end user, expressing parallelism of an algorithm is a major challenge for many researchers. Parallel applications are often expressed using a task-parallel model (task graphs), in which tasks can be executed concurrently unless they share a dependency. If these tasks can also be executed in a data-parallel fashion, e.g., by using MPI or OpenMP, then we call it a mixed-parallel programming model. Mixed-parallel applications are often modeled as directed a cyclic graphs (DAGs), where nodes represent the tasks and edges represent data dependencies. To execute a mixed-parallel application efficiently, a good scheduling strategy is required to map the tasks to the available processors. Several algorithms for the scheduling of mixed-parallel applications onto a homogeneous cluster have been proposed. MCPA (Modified CPA) has been shown to lead to efficient schedules. In the allocation phase, MCPA considers the total number of processors allocated to all potentially concurrently running tasks as well as the number of processors in the cluster. In this article, it is shown how MCPA can be extended to obtain a more balanced workload in situations where concurrently running tasks differ significantly in the number of operations. We also show how the allocation procedure can be tuned in order to deal not only with regular DAGs (FFT), but also with irregular ones. We also investigate the question whether additional optimizations of the mapping procedure, such as packing of allocations or backfilling, can reduce the make span of the schedules.
{"title":"Low-Cost Tuning of Two-Step Algorithms for Scheduling Mixed-Parallel Applications onto Homogeneous Clusters","authors":"S. Hunold","doi":"10.1109/CCGRID.2010.52","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.52","url":null,"abstract":"Due to the strong increase of processing units available to the end user, expressing parallelism of an algorithm is a major challenge for many researchers. Parallel applications are often expressed using a task-parallel model (task graphs), in which tasks can be executed concurrently unless they share a dependency. If these tasks can also be executed in a data-parallel fashion, e.g., by using MPI or OpenMP, then we call it a mixed-parallel programming model. Mixed-parallel applications are often modeled as directed a cyclic graphs (DAGs), where nodes represent the tasks and edges represent data dependencies. To execute a mixed-parallel application efficiently, a good scheduling strategy is required to map the tasks to the available processors. Several algorithms for the scheduling of mixed-parallel applications onto a homogeneous cluster have been proposed. MCPA (Modified CPA) has been shown to lead to efficient schedules. In the allocation phase, MCPA considers the total number of processors allocated to all potentially concurrently running tasks as well as the number of processors in the cluster. In this article, it is shown how MCPA can be extended to obtain a more balanced workload in situations where concurrently running tasks differ significantly in the number of operations. We also show how the allocation procedure can be tuned in order to deal not only with regular DAGs (FFT), but also with irregular ones. We also investigate the question whether additional optimizations of the mapping procedure, such as packing of allocations or backfilling, can reduce the make span of the schedules.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128922698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In our research work, we use two Dynamic Data Driven Application System (DDDAS) methodologies to predict wildfire propagation. Our goal is to build a system that dynamically adapts to constant changes in environmental conditions when a hazard occurs and under strict real-time deadlines. For this purpose, we are on the way of building a parallel wildfire prediction method, which is able to assimilate real-time data to be injected in the prediction process at execution time. In this paper, we propose a strategy for data injection in distributed environments.
{"title":"Data Injection at Execution Time in Grid Environments Using Dynamic Data Driven Application System for Wildland Fire Spread Prediction","authors":"Roque Rodríguez, A. Cortés, T. Margalef","doi":"10.1109/CCGRID.2010.74","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.74","url":null,"abstract":"In our research work, we use two Dynamic Data Driven Application System (DDDAS) methodologies to predict wildfire propagation. Our goal is to build a system that dynamically adapts to constant changes in environmental conditions when a hazard occurs and under strict real-time deadlines. For this purpose, we are on the way of building a parallel wildfire prediction method, which is able to assimilate real-time data to be injected in the prediction process at execution time. In this paper, we propose a strategy for data injection in distributed environments.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116794145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Takayuki Banzai, Hitoshi Koizumi, Ryo Kanbayashi, Takayuki Imada, T. Hanawa, M. Sato
In this paper, we propose a software testing environment, called D-Cloud, using cloud computing technology and virtual machines with fault injection facility. Nevertheless, the importance of high dependability in a software system has recently increased, and exhaustive testing of software systems is becoming expensive and time-consuming, and, in many cases, sufficient software testing is not possible. In particular, it is often difficult to test parallel and distributed systems in the real world after deployment, although reliable systems, such as high-availability servers, are parallel and distributed systems. D-Cloud is a cloud system which manages virtual machines with fault injection facility. D-Cloud sets up a test environment on the cloud resources using a given system configuration file and executes several tests automatically according to a given scenario. In this scenario, D-Cloud enables fault tolerance testing by causing device faults by virtual machine. We have designed the D-Cloud system using Eucalyptus software and a description language for system configuration and the scenario of fault injection written in XML. We found that the D-Cloud system, which allows a user to easily set up and test a distributed system on the cloud and effectively reduces the cost and time of testing.
{"title":"D-Cloud: Design of a Software Testing Environment for Reliable Distributed Systems Using Cloud Computing Technology","authors":"Takayuki Banzai, Hitoshi Koizumi, Ryo Kanbayashi, Takayuki Imada, T. Hanawa, M. Sato","doi":"10.1109/CCGRID.2010.72","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.72","url":null,"abstract":"In this paper, we propose a software testing environment, called D-Cloud, using cloud computing technology and virtual machines with fault injection facility. Nevertheless, the importance of high dependability in a software system has recently increased, and exhaustive testing of software systems is becoming expensive and time-consuming, and, in many cases, sufficient software testing is not possible. In particular, it is often difficult to test parallel and distributed systems in the real world after deployment, although reliable systems, such as high-availability servers, are parallel and distributed systems. D-Cloud is a cloud system which manages virtual machines with fault injection facility. D-Cloud sets up a test environment on the cloud resources using a given system configuration file and executes several tests automatically according to a given scenario. In this scenario, D-Cloud enables fault tolerance testing by causing device faults by virtual machine. We have designed the D-Cloud system using Eucalyptus software and a description language for system configuration and the scenario of fault injection written in XML. We found that the D-Cloud system, which allows a user to easily set up and test a distributed system on the cloud and effectively reduces the cost and time of testing.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115267297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. B. Rizvandi, J. Taheri, Albert Y. Zomaya, Young Choon Lee
The energy consumption issue in distributed computing systems has become quite critical due to environmental concerns. In response to this, many energy-aware scheduling algorithms have been developed primarily by using the dynamic voltage-frequency scaling (DVFS) capability incorporated in recent commodity processors. The majority of these algorithms involve two passes: schedule generation and slack reclamation. The latter is typically achieved by lowering processor frequency for tasks with slacks. In this paper, we revisit this energy reduction technique from a different perspective and propose a new slack reclamation algorithm which uses a linear combination of the maximum and minimum processor frequencies to decrease energy consumption. This algorithm has been evaluated based on results obtained from experiments with three different sets of task graphs: 1,500 randomly generated task graphs, and 300 task graphs of each of two real-world applications (Gauss-Jordan and LU decomposition). The results show that the amount of energy saved in the proposed algorithm is 13.5%, 25.5% and 0.11% for random, LU decomposition and Gauss-Jordan task graphs, respectively, these percentages for the reference DVFSbased algorithm are 12.4%, 24.6% and 0.1%, respectively.
{"title":"Linear Combinations of DVFS-Enabled Processor Frequencies to Modify the Energy-Aware Scheduling Algorithms","authors":"N. B. Rizvandi, J. Taheri, Albert Y. Zomaya, Young Choon Lee","doi":"10.1109/CCGRID.2010.38","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.38","url":null,"abstract":"The energy consumption issue in distributed computing systems has become quite critical due to environmental concerns. In response to this, many energy-aware scheduling algorithms have been developed primarily by using the dynamic voltage-frequency scaling (DVFS) capability incorporated in recent commodity processors. The majority of these algorithms involve two passes: schedule generation and slack reclamation. The latter is typically achieved by lowering processor frequency for tasks with slacks. In this paper, we revisit this energy reduction technique from a different perspective and propose a new slack reclamation algorithm which uses a linear combination of the maximum and minimum processor frequencies to decrease energy consumption. This algorithm has been evaluated based on results obtained from experiments with three different sets of task graphs: 1,500 randomly generated task graphs, and 300 task graphs of each of two real-world applications (Gauss-Jordan and LU decomposition). The results show that the amount of energy saved in the proposed algorithm is 13.5%, 25.5% and 0.11% for random, LU decomposition and Gauss-Jordan task graphs, respectively, these percentages for the reference DVFSbased algorithm are 12.4%, 24.6% and 0.1%, respectively.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125910181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Agostino Forestiero, C. Mastroianni, Giuseppe Papuzzo, G. Spezzano
The ICT market is experiencing an important shift from the request/provisioning of products toward a service-oriented view where everything (computing, storage, applications) is provided as a network-enabled service. It often happens that a solution to a problem cannot be offered by a single service, but by composing multiple basic services in a workflow. Service composition is indeed an important research topic that involves issues such as the design and execution of a workflow and the discovery of the component services on the network. This paper deals with the latter issue and presents an ant-inspired framework that facilitates collective discovery requests, issued to search a network for all the basic services that will compose a specific workflow. The idea is to reorganize the services so that the descriptors of services that are often used together are placed in neighbor peers. This helps a single query to find multiple basic services, which decreases the number of necessary queries and, consequently, lowers the search time and the network load.
{"title":"A Proximity-Based Self-Organizing Framework for Service Composition and Discovery","authors":"Agostino Forestiero, C. Mastroianni, Giuseppe Papuzzo, G. Spezzano","doi":"10.1109/CCGRID.2010.48","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.48","url":null,"abstract":"The ICT market is experiencing an important shift from the request/provisioning of products toward a service-oriented view where everything (computing, storage, applications) is provided as a network-enabled service. It often happens that a solution to a problem cannot be offered by a single service, but by composing multiple basic services in a workflow. Service composition is indeed an important research topic that involves issues such as the design and execution of a workflow and the discovery of the component services on the network. This paper deals with the latter issue and presents an ant-inspired framework that facilitates collective discovery requests, issued to search a network for all the basic services that will compose a specific workflow. The idea is to reorganize the services so that the descriptors of services that are often used together are placed in neighbor peers. This helps a single query to find multiple basic services, which decreases the number of necessary queries and, consequently, lowers the search time and the network load.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131444595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Harold E. Castro, Eduardo Rosales, Mario Villamizar, A. Jimenez
This paper deals with the design and implementation of a virtual opportunistic grid infrastructure that allows taking advantage of the idle processing capabilities currently available in the computer labs of a university campus, ensuring local users to have priority in accessing the computational resources, while simultaneously, a virtual cluster takes the resources unused by them. A virtualization strategy is proposed to allow the deployment of opportunistic virtual clusters which integration provides a scalable grid solution capable of supplying the high performance computing (HPC) needs required for the development of e-Science projects. The proposed solution was implemented and tested through the execution of opportunistic virtual clusters with customized application environments for projects of different scientific disciplines, evidencing high efficiency in result generation.
{"title":"UnaGrid: On Demand Opportunistic Desktop Grid","authors":"Harold E. Castro, Eduardo Rosales, Mario Villamizar, A. Jimenez","doi":"10.1109/CCGRID.2010.79","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.79","url":null,"abstract":"This paper deals with the design and implementation of a virtual opportunistic grid infrastructure that allows taking advantage of the idle processing capabilities currently available in the computer labs of a university campus, ensuring local users to have priority in accessing the computational resources, while simultaneously, a virtual cluster takes the resources unused by them. A virtualization strategy is proposed to allow the deployment of opportunistic virtual clusters which integration provides a scalable grid solution capable of supplying the high performance computing (HPC) needs required for the development of e-Science projects. The proposed solution was implemented and tested through the execution of opportunistic virtual clusters with customized application environments for projects of different scientific disciplines, evidencing high efficiency in result generation.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"2007 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125565585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Lassnig, T. Fahringer, V. Garonne, A. Molfetas, M. Branco
Non-periodic bursts are prevalent in workloads of large scale applications. Existing workload models do not predict such non-periodic bursts very well because they mainly focus on repeatable base functions. We begin by showing the necessity to include bursts in workload models by investigating their detrimental effects in a petabyte-scale distributed data management system. This work then makes three contributions. First, we analyse the accuracy of five existing prediction models on workloads of data and computational grids, as well as derived synthetic workloads. Second, we introduce a novel averages-based model to predict bursts in arbitrary workloads. Third, we present a novel metric, mean absolute estimated distance, to assess the prediction accuracy of the model. Using our model and metric, we show that burst behaviour in workloads can be identified, quantified and predicted independently of the underlying base functions. Furthermore, our model and metric are applicable to arbitrary kinds of burst prediction for time series.
{"title":"Identification, Modelling and Prediction of Non-periodic Bursts in Workloads","authors":"M. Lassnig, T. Fahringer, V. Garonne, A. Molfetas, M. Branco","doi":"10.1109/CCGRID.2010.118","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.118","url":null,"abstract":"Non-periodic bursts are prevalent in workloads of large scale applications. Existing workload models do not predict such non-periodic bursts very well because they mainly focus on repeatable base functions. We begin by showing the necessity to include bursts in workload models by investigating their detrimental effects in a petabyte-scale distributed data management system. This work then makes three contributions. First, we analyse the accuracy of five existing prediction models on workloads of data and computational grids, as well as derived synthetic workloads. Second, we introduce a novel averages-based model to predict bursts in arbitrary workloads. Third, we present a novel metric, mean absolute estimated distance, to assess the prediction accuracy of the model. Using our model and metric, we show that burst behaviour in workloads can be identified, quantified and predicted independently of the underlying base functions. Furthermore, our model and metric are applicable to arbitrary kinds of burst prediction for time series.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133887848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}