{"title":"GeoCol: A Geo-distributed Cloud Storage System with Low Cost and Latency using Reinforcement Learning","authors":"Haoyu Wang, Haiying Shen, Zijian Li, Shuhao Tian","doi":"10.1109/ICDCS51616.2021.00023","DOIUrl":null,"url":null,"abstract":"More and more web applications are deployed on the cloud storage services that store data objects of the web applications in the geo-distributed datacenters belonging to Cloud Service Providers (CSPs). In order to provide low request latency to the web application users, in the previous work, the web application developers need to store more data object replicas in a large number of datacenters or send redundant requests to multiple datacenters (e.g., closest datacenters), both of which increase monetary cost. In this paper, we conducted request latency measurement from a GENI server (as a client) to AWS S3 datacenters for one month, and our observations lay the foundation for our proposed system called GeoCol, a geo-distributed cloud storage system with low cost and latency using reinforcement learning (RL). To achieve the optimal tradeoff between the monetary cost and the request latency, GeoCol encompasses a request split method and a storage planning method. The request split method uses the SARIMA machine learning (ML) technique to predict the request latency as an input to an RL model to determine the number of sub-requests and the datacenter for each sub-request for a request in order to enable the parallel transmissions for a data object. In the storage planning method, each datacenter uses RL to determine whether each data object should be stored and the storage type of each stored data object. Our trace-driven experiment on AWS S3 and GENI platform shows that GeoCol outperforms other comparison methods in monetary cost with 32 % reduction and data object request latency with 51 % reduction.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":" 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS51616.2021.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
More and more web applications are deployed on the cloud storage services that store data objects of the web applications in the geo-distributed datacenters belonging to Cloud Service Providers (CSPs). In order to provide low request latency to the web application users, in the previous work, the web application developers need to store more data object replicas in a large number of datacenters or send redundant requests to multiple datacenters (e.g., closest datacenters), both of which increase monetary cost. In this paper, we conducted request latency measurement from a GENI server (as a client) to AWS S3 datacenters for one month, and our observations lay the foundation for our proposed system called GeoCol, a geo-distributed cloud storage system with low cost and latency using reinforcement learning (RL). To achieve the optimal tradeoff between the monetary cost and the request latency, GeoCol encompasses a request split method and a storage planning method. The request split method uses the SARIMA machine learning (ML) technique to predict the request latency as an input to an RL model to determine the number of sub-requests and the datacenter for each sub-request for a request in order to enable the parallel transmissions for a data object. In the storage planning method, each datacenter uses RL to determine whether each data object should be stored and the storage type of each stored data object. Our trace-driven experiment on AWS S3 and GENI platform shows that GeoCol outperforms other comparison methods in monetary cost with 32 % reduction and data object request latency with 51 % reduction.