GeoCol: A Geo-distributed Cloud Storage System with Low Cost and Latency using Reinforcement Learning

Haoyu Wang, Haiying Shen, Zijian Li, Shuhao Tian
{"title":"GeoCol: A Geo-distributed Cloud Storage System with Low Cost and Latency using Reinforcement Learning","authors":"Haoyu Wang, Haiying Shen, Zijian Li, Shuhao Tian","doi":"10.1109/ICDCS51616.2021.00023","DOIUrl":null,"url":null,"abstract":"More and more web applications are deployed on the cloud storage services that store data objects of the web applications in the geo-distributed datacenters belonging to Cloud Service Providers (CSPs). In order to provide low request latency to the web application users, in the previous work, the web application developers need to store more data object replicas in a large number of datacenters or send redundant requests to multiple datacenters (e.g., closest datacenters), both of which increase monetary cost. In this paper, we conducted request latency measurement from a GENI server (as a client) to AWS S3 datacenters for one month, and our observations lay the foundation for our proposed system called GeoCol, a geo-distributed cloud storage system with low cost and latency using reinforcement learning (RL). To achieve the optimal tradeoff between the monetary cost and the request latency, GeoCol encompasses a request split method and a storage planning method. The request split method uses the SARIMA machine learning (ML) technique to predict the request latency as an input to an RL model to determine the number of sub-requests and the datacenter for each sub-request for a request in order to enable the parallel transmissions for a data object. In the storage planning method, each datacenter uses RL to determine whether each data object should be stored and the storage type of each stored data object. Our trace-driven experiment on AWS S3 and GENI platform shows that GeoCol outperforms other comparison methods in monetary cost with 32 % reduction and data object request latency with 51 % reduction.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":" 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS51616.2021.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

More and more web applications are deployed on the cloud storage services that store data objects of the web applications in the geo-distributed datacenters belonging to Cloud Service Providers (CSPs). In order to provide low request latency to the web application users, in the previous work, the web application developers need to store more data object replicas in a large number of datacenters or send redundant requests to multiple datacenters (e.g., closest datacenters), both of which increase monetary cost. In this paper, we conducted request latency measurement from a GENI server (as a client) to AWS S3 datacenters for one month, and our observations lay the foundation for our proposed system called GeoCol, a geo-distributed cloud storage system with low cost and latency using reinforcement learning (RL). To achieve the optimal tradeoff between the monetary cost and the request latency, GeoCol encompasses a request split method and a storage planning method. The request split method uses the SARIMA machine learning (ML) technique to predict the request latency as an input to an RL model to determine the number of sub-requests and the datacenter for each sub-request for a request in order to enable the parallel transmissions for a data object. In the storage planning method, each datacenter uses RL to determine whether each data object should be stored and the storage type of each stored data object. Our trace-driven experiment on AWS S3 and GENI platform shows that GeoCol outperforms other comparison methods in monetary cost with 32 % reduction and data object request latency with 51 % reduction.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GeoCol:基于强化学习的低成本、低延迟的地理分布式云存储系统
越来越多的web应用部署在云存储服务上,云存储服务将web应用的数据对象存储在云服务提供商(csp)的地理分布式数据中心中。为了向web应用程序用户提供低的请求延迟,在之前的工作中,web应用程序开发人员需要在大量的数据中心中存储更多的数据对象副本,或者向多个数据中心(例如最近的数据中心)发送冗余请求,这两者都增加了货币成本。在本文中,我们从GENI服务器(作为客户端)到AWS S3数据中心进行了为期一个月的请求延迟测量,我们的观察结果为我们提出的名为GeoCol的系统奠定了基础,GeoCol是一种使用强化学习(RL)的低成本和延迟的地理分布式云存储系统。为了实现货币成本和请求延迟之间的最佳权衡,GeoCol包含了请求分割方法和存储规划方法。请求分割方法使用SARIMA机器学习(ML)技术来预测请求延迟,作为RL模型的输入,以确定请求的子请求数量和每个子请求的数据中心,以便为数据对象启用并行传输。在存储规划方法中,每个数据中心使用RL来确定每个数据对象是否需要存储,以及每个存储数据对象的存储类型。我们在AWS S3和GENI平台上进行的跟踪驱动实验表明,GeoCol在货币成本方面比其他比较方法降低了32%,数据对象请求延迟降低了51%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Practical Location Privacy Attacks and Defense on Point-of-interest Aggregates Hand-Key: Leveraging Multiple Hand Biometrics for Attack-Resilient User Authentication Using COTS RFID Recognizing 3D Orientation of a Two-RFID-Tag Labeled Object in Multipath Environments Using Deep Transfer Learning The Vertical Cuckoo Filters: A Family of Insertion-friendly Sketches for Online Applications Dyconits: Scaling Minecraft-like Services through Dynamically Managed Inconsistency
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1