We address the problem of building scalable transaction management mechanisms for multi-row transactions on key-value storage systems. We develop scalable techniques for transaction management utilizing the snapshot isolation (SI)model. Because the SI model can lead to non-serializable transaction executions, we investigate two conflict detection techniques for ensuring serializability under SI. To support scalability, we investigate system models and mechanisms in which the transaction management functions are decoupled from the storage system and integrated with the application-level processes. We present two system models and demonstrate their scalability under the scale-out paradigm of Cloud computing platforms. In the first system model, all transaction management functions are executed in a fully decentralized manner by the application processes. The second model is based on a hybrid approach in which the conflict detection techniques are implemented by a dedicated service. We perform a comparative evaluation of these models using the TPC-C benchmark and demonstrate their scalability.
{"title":"Scalable Transaction Management with Snapshot Isolation on Cloud Data Management Systems","authors":"Vinit Padhye, A. Tripathi","doi":"10.1109/CLOUD.2012.102","DOIUrl":"https://doi.org/10.1109/CLOUD.2012.102","url":null,"abstract":"We address the problem of building scalable transaction management mechanisms for multi-row transactions on key-value storage systems. We develop scalable techniques for transaction management utilizing the snapshot isolation (SI)model. Because the SI model can lead to non-serializable transaction executions, we investigate two conflict detection techniques for ensuring serializability under SI. To support scalability, we investigate system models and mechanisms in which the transaction management functions are decoupled from the storage system and integrated with the application-level processes. We present two system models and demonstrate their scalability under the scale-out paradigm of Cloud computing platforms. In the first system model, all transaction management functions are executed in a fully decentralized manner by the application processes. The second model is based on a hybrid approach in which the conflict detection techniques are implemented by a dedicated service. We perform a comparative evaluation of these models using the TPC-C benchmark and demonstrate their scalability.","PeriodicalId":214084,"journal":{"name":"2012 IEEE Fifth International Conference on Cloud Computing","volume":"206 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114985620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With object storage services becoming increasingly accepted as replacements for traditional file or block systems, it is important to effectively measure the performance of these services. Thus people can compare different solutions or tune their systems for better performance. However, little has been reported on this specific topic as yet. To address this problem, we present COSBench (Cloud Object Storage Benchmark), a benchmark tool that we are currently working on in Intel for cloud object storage services. In addition, in this paper, we also share the results of the experiments we have performed so far.
{"title":"COSBench: A Benchmark Tool for Cloud Object Storage Services","authors":"Qing Zheng, Hao-peng Chen, Yaguang Wang, Jiangang Duan, Zhiteng Huang","doi":"10.1109/CLOUD.2012.52","DOIUrl":"https://doi.org/10.1109/CLOUD.2012.52","url":null,"abstract":"With object storage services becoming increasingly accepted as replacements for traditional file or block systems, it is important to effectively measure the performance of these services. Thus people can compare different solutions or tune their systems for better performance. However, little has been reported on this specific topic as yet. To address this problem, we present COSBench (Cloud Object Storage Benchmark), a benchmark tool that we are currently working on in Intel for cloud object storage services. In addition, in this paper, we also share the results of the experiments we have performed so far.","PeriodicalId":214084,"journal":{"name":"2012 IEEE Fifth International Conference on Cloud Computing","volume":"280 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121821782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Pawluk, B. Simmons, Michael Smit, Marin Litoiu, Serge Mankovskii
This paper introduces a cloud broker service (STRATOS) which facilitates the deployment and runtime management of cloud application topologies using cloud elements/services sourced on the fly from multiple providers, based on requirements specified in higher level objectives. Its implementation and use is evaluated in a set of experiments.
{"title":"Introducing STRATOS: A Cloud Broker Service","authors":"P. Pawluk, B. Simmons, Michael Smit, Marin Litoiu, Serge Mankovskii","doi":"10.1109/CLOUD.2012.24","DOIUrl":"https://doi.org/10.1109/CLOUD.2012.24","url":null,"abstract":"This paper introduces a cloud broker service (STRATOS) which facilitates the deployment and runtime management of cloud application topologies using cloud elements/services sourced on the fly from multiple providers, based on requirements specified in higher level objectives. Its implementation and use is evaluated in a set of experiments.","PeriodicalId":214084,"journal":{"name":"2012 IEEE Fifth International Conference on Cloud Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125220151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, the number and size of RDF data collections has increased rapidly making the issue of scalable processing techniques crucial. The MapReduce model has become a de facto standard for large scale data processing using a cluster of machines in the cloud. Generally, RDF query processing creates join-intensive workloads, resulting in lengthy MapReduce workflows with expensive I/O, data transfer, and sorting costs. However, the MapReduce computation model provides limited static optimization techniques used in relational databases (e.g., indexing and cost-based optimization). Consequently, dynamic optimization techniques for such join-intensive tasks on MapReduce need to be investigated. In some previous efforts, we propose a Nested Triple Group data model and Algebra (NTGA) for efficient graph pattern query processing in the cloud. Here, we extend this work with a scan-sharing technique that is used to optimize the processing of graph patterns with repeated properties. Specifically, our scan-sharing technique eliminates the need for repeated scanning of input relations when properties are used repeatedly in graph patterns. A formal foundation underlying this scan sharing technique is discussed as well as an implementation strategy that has been integrated in the Apache Pig framework is presented. We also present a comprehensive evaluation demonstrating performance benefits of our NTGA plus scan-sharing approach.
{"title":"Scan-Sharing for Optimizing RDF Graph Pattern Matching on MapReduce","authors":"Hyeongsik Kim, P. Ravindra, Kemafor Anyanwu","doi":"10.1109/CLOUD.2012.14","DOIUrl":"https://doi.org/10.1109/CLOUD.2012.14","url":null,"abstract":"Recently, the number and size of RDF data collections has increased rapidly making the issue of scalable processing techniques crucial. The MapReduce model has become a de facto standard for large scale data processing using a cluster of machines in the cloud. Generally, RDF query processing creates join-intensive workloads, resulting in lengthy MapReduce workflows with expensive I/O, data transfer, and sorting costs. However, the MapReduce computation model provides limited static optimization techniques used in relational databases (e.g., indexing and cost-based optimization). Consequently, dynamic optimization techniques for such join-intensive tasks on MapReduce need to be investigated. In some previous efforts, we propose a Nested Triple Group data model and Algebra (NTGA) for efficient graph pattern query processing in the cloud. Here, we extend this work with a scan-sharing technique that is used to optimize the processing of graph patterns with repeated properties. Specifically, our scan-sharing technique eliminates the need for repeated scanning of input relations when properties are used repeatedly in graph patterns. A formal foundation underlying this scan sharing technique is discussed as well as an implementation strategy that has been integrated in the Apache Pig framework is presented. We also present a comprehensive evaluation demonstrating performance benefits of our NTGA plus scan-sharing approach.","PeriodicalId":214084,"journal":{"name":"2012 IEEE Fifth International Conference on Cloud Computing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125967764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deepak Jeswani, Manish Gupta, Pradipta De, Arpit Malani, U. Bellur
In Software-as-a-Service (SaaS) cloud delivery model, a hosting center deploys a Virtual Machine (VM) image template on a server on demand. Image templates are usually maintained in a central repository. With geographically dispersed hosting centers, time to transfer a large, often GigaByte sized, template file from the repository faces high latency due to low Internet bandwidth. An architecture that maintains a template cache, collocated with the hosting centers, can reduce request service latency. Since templates are large in size, caching complete templates is prohibitive in terms of storage space. In order to optimize cache space requirement, as well as, to reduce transfers from the repository, we propose a differential template caching technique, called DiffCache. A difference file or a patch between two templates, that have common components, is small in size. DiffCache computes an optimal selection of templates and patches based on the frequency of requests for specific templates. A template missing in the cache can be generated if any cached template can be patched with a cached patch file, thereby saving the transfer time from the repository at the cost of relatively small patching time. We show that patch based caching coupled with intelligent population of the cache can lead to a 90% improvement in service request latency when compared with caching only template files.
{"title":"Minimizing Latency in Serving Requests through Differential Template Caching in a Cloud","authors":"Deepak Jeswani, Manish Gupta, Pradipta De, Arpit Malani, U. Bellur","doi":"10.1109/CLOUD.2012.17","DOIUrl":"https://doi.org/10.1109/CLOUD.2012.17","url":null,"abstract":"In Software-as-a-Service (SaaS) cloud delivery model, a hosting center deploys a Virtual Machine (VM) image template on a server on demand. Image templates are usually maintained in a central repository. With geographically dispersed hosting centers, time to transfer a large, often GigaByte sized, template file from the repository faces high latency due to low Internet bandwidth. An architecture that maintains a template cache, collocated with the hosting centers, can reduce request service latency. Since templates are large in size, caching complete templates is prohibitive in terms of storage space. In order to optimize cache space requirement, as well as, to reduce transfers from the repository, we propose a differential template caching technique, called DiffCache. A difference file or a patch between two templates, that have common components, is small in size. DiffCache computes an optimal selection of templates and patches based on the frequency of requests for specific templates. A template missing in the cache can be generated if any cached template can be patched with a cached patch file, thereby saving the transfer time from the repository at the cost of relatively small patching time. We show that patch based caching coupled with intelligent population of the cache can lead to a 90% improvement in service request latency when compared with caching only template files.","PeriodicalId":214084,"journal":{"name":"2012 IEEE Fifth International Conference on Cloud Computing","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116006518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Live migration technologies can contribute to efficient resource management in a cloud datacenter; however, they will inevitably entail downtime for the virtual machine involved. Even if the downtime is relatively short, its effect can be serious for applications sensitive to response time degradations. Therefore, cloud datacenter providers should control live migration operations to minimize the impact on the performance of applications running on the cloud infrastructure. With this understanding, we studied the impact of live migration on the performance of 2-tier web applications in an experimental setup using XenServer and RUBBoS benchmark. We revealed that the behavior of the transmission control protocol (TCP) can be the primary factor responsible for response time degradation during live migration. On the basis of the experimental results, we constructed functions to estimate the performance impact of live migration on the applications. We also examined a case study to demonstrate how cloud computing datacenters can determine the best live migration strategy to minimize application performance degradation.
{"title":"Impact of Live Migration on Multi-tier Application Performance in Clouds","authors":"S. Kikuchi, Y. Matsumoto","doi":"10.1109/CLOUD.2012.57","DOIUrl":"https://doi.org/10.1109/CLOUD.2012.57","url":null,"abstract":"Live migration technologies can contribute to efficient resource management in a cloud datacenter; however, they will inevitably entail downtime for the virtual machine involved. Even if the downtime is relatively short, its effect can be serious for applications sensitive to response time degradations. Therefore, cloud datacenter providers should control live migration operations to minimize the impact on the performance of applications running on the cloud infrastructure. With this understanding, we studied the impact of live migration on the performance of 2-tier web applications in an experimental setup using XenServer and RUBBoS benchmark. We revealed that the behavior of the transmission control protocol (TCP) can be the primary factor responsible for response time degradation during live migration. On the basis of the experimental results, we constructed functions to estimate the performance impact of live migration on the applications. We also examined a case study to demonstrate how cloud computing datacenters can determine the best live migration strategy to minimize application performance degradation.","PeriodicalId":214084,"journal":{"name":"2012 IEEE Fifth International Conference on Cloud Computing","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122491449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the advancement of virtualization technologies and the benefit of economies of scale, industries are seeking scalable IT solutions, such as data centers hosted either in-house or by a third party. Data center availability, often via a cloud setting, is ubiquitous. Nonetheless, little is known about the in-production performance of data centers, and especially the interaction of workload demands and resource availability. This study fills this gap by conducting a large scale survey of in-production data center servers within a time period that spans two years. We provide in-depth analysis on the time evolution of existing data center demands by providing a holistic characterization of typical data center server workloads, by focusing on their basic resource components, including CPU, memory, and storage systems. We especially focus on seasonality of resource demands and how this is affected by different geographical locations. This survey provides a glimpse on the evolution of data center workloads and provides a basis for an economics analysis that can be used for effective capacity planning of future data centers.
{"title":"Data Centers in the Cloud: A Large Scale Performance Study","authors":"R. Birke, L. Chen, E. Smirni","doi":"10.1109/CLOUD.2012.87","DOIUrl":"https://doi.org/10.1109/CLOUD.2012.87","url":null,"abstract":"With the advancement of virtualization technologies and the benefit of economies of scale, industries are seeking scalable IT solutions, such as data centers hosted either in-house or by a third party. Data center availability, often via a cloud setting, is ubiquitous. Nonetheless, little is known about the in-production performance of data centers, and especially the interaction of workload demands and resource availability. This study fills this gap by conducting a large scale survey of in-production data center servers within a time period that spans two years. We provide in-depth analysis on the time evolution of existing data center demands by providing a holistic characterization of typical data center server workloads, by focusing on their basic resource components, including CPU, memory, and storage systems. We especially focus on seasonality of resource demands and how this is affected by different geographical locations. This survey provides a glimpse on the evolution of data center workloads and provides a basis for an economics analysis that can be used for effective capacity planning of future data centers.","PeriodicalId":214084,"journal":{"name":"2012 IEEE Fifth International Conference on Cloud Computing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122994071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ITRI container computer is a modular computer designed to be a building block for constructing cloud-scale data centers. Rather than using a traditional data center network architecture, which is typically based on a combination of Layer 2 switches and Layer 3 routers, ITRI containercomputer's internal interconnection fabric, called Peregrine, is specially architected to meet the scalability, fast fail-over and multi-tenancy requirements of these data centers. Peregrine is an all-Layer 2 network that is designed to support up to one million Layer 2 end points, provide quick recovery from any single network link/device failure, and incorporate dynamic load-balancing routing to make the best of all physical network links. Finally, the Peregrine architecture is implementable using only off-the-shelf commodity Ethernet switches. This paper describes the design and implementation of a fully operational Peregrine prototype, which is built on a folded Clos physical network topology, and the results and analysis of a performance evaluation study based on measurements taken on this prototype.
{"title":"Peregrine: An All-Layer-2 Container Computer Network","authors":"T. Chiueh, Cheng-Chun Tu, Yu-Cheng Wang, Pai-Wei Wang, Kai-Wen Li, Yu-Ming Huang","doi":"10.1109/CLOUD.2012.69","DOIUrl":"https://doi.org/10.1109/CLOUD.2012.69","url":null,"abstract":"ITRI container computer is a modular computer designed to be a building block for constructing cloud-scale data centers. Rather than using a traditional data center network architecture, which is typically based on a combination of Layer 2 switches and Layer 3 routers, ITRI containercomputer's internal interconnection fabric, called Peregrine, is specially architected to meet the scalability, fast fail-over and multi-tenancy requirements of these data centers. Peregrine is an all-Layer 2 network that is designed to support up to one million Layer 2 end points, provide quick recovery from any single network link/device failure, and incorporate dynamic load-balancing routing to make the best of all physical network links. Finally, the Peregrine architecture is implementable using only off-the-shelf commodity Ethernet switches. This paper describes the design and implementation of a fully operational Peregrine prototype, which is built on a folded Clos physical network topology, and the results and analysis of a performance evaluation study based on measurements taken on this prototype.","PeriodicalId":214084,"journal":{"name":"2012 IEEE Fifth International Conference on Cloud Computing","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129090481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spatial overlay processing is a widely used compute-intensive GIS application that involves aggregation of two or more layers of maps to facilitate intelligent querying on the collocated output data. When large GIS data sets are represented in polygonal (vector) form, spatial analysis runs for extended periods of time, which is undesirable for time-sensitive applications such as emergency response. We have, for the first time, created an open-architecture-based system named Crayons for Azure cloud platform using state-of-the-art techniques. During the course of development of Crayons system, we faced numerous challenges and gained invaluable insights into Azure cloud platform, which are presented in detail in this paper. The challenges range from limitations of cloud storage and computational services to the choices of tools and technologies used for high performance computing (HPC) application design. We report our findings to provide concrete guidelines to an eScience developer for 1) choice of persistent data storage mechanism, 2) data structure representation, 3) communication and synchronization among nodes, 4) building robust failsafe applications, and 5) optimal cost-effective utilization of resources. Our insights into each challenge faced, the solution to overcome it, and the discussion on the lessons learnt from each challenge can be of help to eScience developers starting application development on Azure and possibly other cloud platforms.
{"title":"Lessons Learnt from the Development of GIS Application on Azure Cloud Platform","authors":"Dinesh Agarwal, S. Prasad","doi":"10.1109/CLOUD.2012.140","DOIUrl":"https://doi.org/10.1109/CLOUD.2012.140","url":null,"abstract":"Spatial overlay processing is a widely used compute-intensive GIS application that involves aggregation of two or more layers of maps to facilitate intelligent querying on the collocated output data. When large GIS data sets are represented in polygonal (vector) form, spatial analysis runs for extended periods of time, which is undesirable for time-sensitive applications such as emergency response. We have, for the first time, created an open-architecture-based system named Crayons for Azure cloud platform using state-of-the-art techniques. During the course of development of Crayons system, we faced numerous challenges and gained invaluable insights into Azure cloud platform, which are presented in detail in this paper. The challenges range from limitations of cloud storage and computational services to the choices of tools and technologies used for high performance computing (HPC) application design. We report our findings to provide concrete guidelines to an eScience developer for 1) choice of persistent data storage mechanism, 2) data structure representation, 3) communication and synchronization among nodes, 4) building robust failsafe applications, and 5) optimal cost-effective utilization of resources. Our insights into each challenge faced, the solution to overcome it, and the discussion on the lessons learnt from each challenge can be of help to eScience developers starting application development on Azure and possibly other cloud platforms.","PeriodicalId":214084,"journal":{"name":"2012 IEEE Fifth International Conference on Cloud Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129649046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raúl Gracia-Tinedo, Marc S'nchez-Artigas, P. García-López
The increasing popularity of Cloud storage services is leading end-users to store their digital lives (including photos, videos, work documents, etc.) in the Cloud. However, many users are still reluctant to move their data to the Cloud due to the amount of control ceded to Cloud vendors. To let users retain the control over their data, Friend-to-Friend (F2F) storage systems have been presented in the literature as a promising alternative. However, as we show in this paper, pure F2F storage systems present a poor QoS, mainly due to availability correlations, which results in a loss of attractiveness by end users. To overcome this limitation, we propose a hybrid architecture that combines F2F storage systems and the availability of Cloud storage services to let users infer the right balance between user control and quality of service. This architecture, we called it F2BOX, is able to deliver such a balance thanks to the development of a new suite of data transfer scheduling strategies and a new redundancy calculation algorithm. The main feature of this algorithm is that allow users to adjust the amount of redundancy according to the availability patterns exhibited by friends. Our simulation and experimental results (in Amazon S3) demonstrate the high benefits experienced by end users as a result of the "cloudification" of F2F systems.
{"title":"F2Box: Cloudifying F2F Storage Systems with High Availability Correlation","authors":"Raúl Gracia-Tinedo, Marc S'nchez-Artigas, P. García-López","doi":"10.1109/CLOUD.2012.22","DOIUrl":"https://doi.org/10.1109/CLOUD.2012.22","url":null,"abstract":"The increasing popularity of Cloud storage services is leading end-users to store their digital lives (including photos, videos, work documents, etc.) in the Cloud. However, many users are still reluctant to move their data to the Cloud due to the amount of control ceded to Cloud vendors. To let users retain the control over their data, Friend-to-Friend (F2F) storage systems have been presented in the literature as a promising alternative. However, as we show in this paper, pure F2F storage systems present a poor QoS, mainly due to availability correlations, which results in a loss of attractiveness by end users. To overcome this limitation, we propose a hybrid architecture that combines F2F storage systems and the availability of Cloud storage services to let users infer the right balance between user control and quality of service. This architecture, we called it F2BOX, is able to deliver such a balance thanks to the development of a new suite of data transfer scheduling strategies and a new redundancy calculation algorithm. The main feature of this algorithm is that allow users to adjust the amount of redundancy according to the availability patterns exhibited by friends. Our simulation and experimental results (in Amazon S3) demonstrate the high benefits experienced by end users as a result of the \"cloudification\" of F2F systems.","PeriodicalId":214084,"journal":{"name":"2012 IEEE Fifth International Conference on Cloud Computing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126806176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}