剖析云存储系统的工作负载

Yaodanjun Ren, Xiaoyi Sun, Kai Li, Jiale Lin, Shuzhi Feng, Zhenyu Ren, Jian Yin, Zhengwei Qi
{"title":"剖析云存储系统的工作负载","authors":"Yaodanjun Ren, Xiaoyi Sun, Kai Li, Jiale Lin, Shuzhi Feng, Zhenyu Ren, Jian Yin, Zhengwei Qi","doi":"10.1109/ICDCS54860.2022.00068","DOIUrl":null,"url":null,"abstract":"The innovation and evolution of file and storage systems have been influenced by workload analysis. Though cloud storage systems have been widely deployed and used, real-world and large-scale cloud storage workload studies are rare. Previous large-scale distributed storage systems can meet versatility, stability, and reliability requirements. Furthermore, modern cloud storage systems need to meet additional challenges, such as coping with surges in peak loads and rapid expansion of requests. These changes may lead to different characteristics.In this work, we propose DiTing data tracing system and collect workloads with over 242,000 billion requests from the Alibaba cloud. By comparing the normal days and the Single’s Day (the world’s largest online shopping festival), we analyze characteristics such as I/O scale, latency, locality, and load distribution. Our analysis reveals four key observations as follows. First, the virtual layer is the performance bottleneck of modern cloud storage systems during extreme peak periods. Second, the write operations dominate the data access because the application and operating system buffers absorb reads better than writes. Third, the workload is heavily skewed toward a small percentage of virtual cloud disks, with 20% of cloud disks accounting for 80% of I/O requests. Finally, data access shows poor temporal and spatial locality, and the I/O requests are mostly small-scaled. Based on these observations, we propose several suggestions for cloud storage systems, including separating I/O processing from the virtual layer to the proxy layer, deploying heavy and light workload applications on the same node, and adopting a write-friendly cloud disk design for write-skewed requests, etc. In summary, these workload characteristics and suggestions are useful for designing and implementing next-generation cloud storage systems.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dissecting the Workload of Cloud Storage System\",\"authors\":\"Yaodanjun Ren, Xiaoyi Sun, Kai Li, Jiale Lin, Shuzhi Feng, Zhenyu Ren, Jian Yin, Zhengwei Qi\",\"doi\":\"10.1109/ICDCS54860.2022.00068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The innovation and evolution of file and storage systems have been influenced by workload analysis. Though cloud storage systems have been widely deployed and used, real-world and large-scale cloud storage workload studies are rare. Previous large-scale distributed storage systems can meet versatility, stability, and reliability requirements. Furthermore, modern cloud storage systems need to meet additional challenges, such as coping with surges in peak loads and rapid expansion of requests. These changes may lead to different characteristics.In this work, we propose DiTing data tracing system and collect workloads with over 242,000 billion requests from the Alibaba cloud. By comparing the normal days and the Single’s Day (the world’s largest online shopping festival), we analyze characteristics such as I/O scale, latency, locality, and load distribution. Our analysis reveals four key observations as follows. First, the virtual layer is the performance bottleneck of modern cloud storage systems during extreme peak periods. Second, the write operations dominate the data access because the application and operating system buffers absorb reads better than writes. Third, the workload is heavily skewed toward a small percentage of virtual cloud disks, with 20% of cloud disks accounting for 80% of I/O requests. Finally, data access shows poor temporal and spatial locality, and the I/O requests are mostly small-scaled. Based on these observations, we propose several suggestions for cloud storage systems, including separating I/O processing from the virtual layer to the proxy layer, deploying heavy and light workload applications on the same node, and adopting a write-friendly cloud disk design for write-skewed requests, etc. In summary, these workload characteristics and suggestions are useful for designing and implementing next-generation cloud storage systems.\",\"PeriodicalId\":225883,\"journal\":{\"name\":\"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDCS54860.2022.00068\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS54860.2022.00068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

工作负载分析影响了文件和存储系统的创新和发展。虽然云存储系统已经被广泛部署和使用,但实际和大规模的云存储工作负载研究很少。以前的大规模分布式存储系统可以满足通用性、稳定性和可靠性的要求。此外,现代云存储系统还需要应对额外的挑战,例如应对峰值负载激增和请求的快速扩展。这些变化可能导致不同的特征。在这项工作中,我们提出了DiTing数据跟踪系统,并从阿里云收集了超过242亿个请求的工作负载。通过比较正常日和光棍节(世界上最大的在线购物节),我们分析了I/O规模、延迟、局域性和负载分布等特征。我们的分析揭示了以下四个关键观察结果。首先,虚拟层是现代云存储系统在极端峰值期间的性能瓶颈。其次,写操作主导了数据访问,因为应用程序和操作系统缓冲区比写更能吸收读操作。第三,工作负载严重偏向一小部分虚拟云磁盘,20%的云磁盘处理80%的I/O请求。最后,数据访问表现出较差的时间和空间局部性,并且I/O请求大多是小规模的。基于这些观察,我们对云存储系统提出了一些建议,包括将I/O处理从虚拟层分离到代理层,在同一节点上部署重工作负载和轻工作负载的应用程序,以及对写倾斜请求采用写友好型云磁盘设计等。总之,这些工作负载特征和建议对于设计和实现下一代云存储系统非常有用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Dissecting the Workload of Cloud Storage System
The innovation and evolution of file and storage systems have been influenced by workload analysis. Though cloud storage systems have been widely deployed and used, real-world and large-scale cloud storage workload studies are rare. Previous large-scale distributed storage systems can meet versatility, stability, and reliability requirements. Furthermore, modern cloud storage systems need to meet additional challenges, such as coping with surges in peak loads and rapid expansion of requests. These changes may lead to different characteristics.In this work, we propose DiTing data tracing system and collect workloads with over 242,000 billion requests from the Alibaba cloud. By comparing the normal days and the Single’s Day (the world’s largest online shopping festival), we analyze characteristics such as I/O scale, latency, locality, and load distribution. Our analysis reveals four key observations as follows. First, the virtual layer is the performance bottleneck of modern cloud storage systems during extreme peak periods. Second, the write operations dominate the data access because the application and operating system buffers absorb reads better than writes. Third, the workload is heavily skewed toward a small percentage of virtual cloud disks, with 20% of cloud disks accounting for 80% of I/O requests. Finally, data access shows poor temporal and spatial locality, and the I/O requests are mostly small-scaled. Based on these observations, we propose several suggestions for cloud storage systems, including separating I/O processing from the virtual layer to the proxy layer, deploying heavy and light workload applications on the same node, and adopting a write-friendly cloud disk design for write-skewed requests, etc. In summary, these workload characteristics and suggestions are useful for designing and implementing next-generation cloud storage systems.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Nezha: Exploiting Concurrency for Transaction Processing in DAG-based Blockchains Toward Cleansing Backdoored Neural Networks in Federated Learning Themis: An Equal, Unpredictable, and Scalable Consensus for Consortium Blockchain IoDSCF: A Store-Carry-Forward Routing Protocol for joint Bus Networks and Internet of Drones FlowValve: Packet Scheduling Offloaded on NP-based SmartNICs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1