防止基于HDFS的云存储中数据热度集中

T. Shwe, M. Aritsugi
{"title":"防止基于HDFS的云存储中数据热度集中","authors":"T. Shwe, M. Aritsugi","doi":"10.1145/3368235.3368843","DOIUrl":null,"url":null,"abstract":"Hadoop Distributed File System(HDFS) often experiences skew in data storage over time, mainly because of random data block allocation policy, datanode failure, replica reconstruction, and client activity, leading to utilization and load imbalance in the system. Although HDFS provides tools to rebalance the data in the cluster, balancer only considers balancing disk space utilization among nodes which re-allocates the data from highly utilized nodes to low utilized nodes. Thus, data access skew which is caused by piling a large amount of popular data in one node is not addressed in the default HDFS balancer. To address this issue, we present popularity-aware balancer based on node popularity score which spreads the popular data uniformly among datanodes, resulting in the balance of future access load balancing and reduction of hot spots in the cloud storage system. Simulation results demonstrate the promising benefits of proposed popularity-aware balancer by evaluating the uniform distribution of popular data across nodes without compromising the amount of data transfers and variance in disk space.","PeriodicalId":166357,"journal":{"name":"Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing Companion","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Preventing Data Popularity Concentration in HDFS based Cloud Storage\",\"authors\":\"T. Shwe, M. Aritsugi\",\"doi\":\"10.1145/3368235.3368843\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hadoop Distributed File System(HDFS) often experiences skew in data storage over time, mainly because of random data block allocation policy, datanode failure, replica reconstruction, and client activity, leading to utilization and load imbalance in the system. Although HDFS provides tools to rebalance the data in the cluster, balancer only considers balancing disk space utilization among nodes which re-allocates the data from highly utilized nodes to low utilized nodes. Thus, data access skew which is caused by piling a large amount of popular data in one node is not addressed in the default HDFS balancer. To address this issue, we present popularity-aware balancer based on node popularity score which spreads the popular data uniformly among datanodes, resulting in the balance of future access load balancing and reduction of hot spots in the cloud storage system. Simulation results demonstrate the promising benefits of proposed popularity-aware balancer by evaluating the uniform distribution of popular data across nodes without compromising the amount of data transfers and variance in disk space.\",\"PeriodicalId\":166357,\"journal\":{\"name\":\"Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing Companion\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing Companion\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3368235.3368843\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing Companion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3368235.3368843","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

Hadoop HDFS (Distributed File System)的数据存储随着时间的推移,经常会出现数据存储的倾斜,主要是由于随机的数据块分配策略、datanode故障、副本重建和客户端活动导致系统的利用率和负载不平衡。虽然HDFS提供了重新平衡集群内数据的工具,但balancer只考虑均衡节点间的磁盘空间利用率,将数据从利用率高的节点重新分配到利用率低的节点。因此,在默认的HDFS平衡器中不会解决由于在一个节点上堆积大量流行数据而导致的数据访问倾斜。为了解决这一问题,我们提出了基于节点流行度评分的流行感知均衡器,该均衡器将流行数据统一分布在数据节点之间,从而实现云存储系统未来访问负载均衡和热点减少的平衡。仿真结果表明,通过在不影响数据传输量和磁盘空间方差的情况下评估流行数据跨节点的均匀分布,所提出的流行感知平衡器具有良好的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Preventing Data Popularity Concentration in HDFS based Cloud Storage
Hadoop Distributed File System(HDFS) often experiences skew in data storage over time, mainly because of random data block allocation policy, datanode failure, replica reconstruction, and client activity, leading to utilization and load imbalance in the system. Although HDFS provides tools to rebalance the data in the cluster, balancer only considers balancing disk space utilization among nodes which re-allocates the data from highly utilized nodes to low utilized nodes. Thus, data access skew which is caused by piling a large amount of popular data in one node is not addressed in the default HDFS balancer. To address this issue, we present popularity-aware balancer based on node popularity score which spreads the popular data uniformly among datanodes, resulting in the balance of future access load balancing and reduction of hot spots in the cloud storage system. Simulation results demonstrate the promising benefits of proposed popularity-aware balancer by evaluating the uniform distribution of popular data across nodes without compromising the amount of data transfers and variance in disk space.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Applicability of Serverless Computing in Fog Computing Environments for IoT Scenarios Toward Stock Price Prediction using Deep Learning Concurrent Failure Recovery for MSR Regenerating Code via Product Matrix Construction Blockchain as a Trusted Component in Cloud SLA Verification Héctor: A Framework for Testing IoT Applications Across Heterogeneous Edge and Cloud Testbeds
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1