{"title":"ClusterSR: Cluster-Aware Scattered Repair in Erasure-Coded Storage","authors":"Zhirong Shen, J. Shu, Zhijie Huang, Yingxun Fu","doi":"10.1109/IPDPS47924.2020.00015","DOIUrl":null,"url":null,"abstract":"Erasure coding is a storage-efficient means to guarantee data reliability in today’s commodity storage systems, yet its repair performance is seriously hindered by the substantial repair traffic. Repair in clustered storage systems is even complicated because of the scarcity of the cross-cluster bandwidth. We present ClusterSR, a cluster-aware scattered repair approach. ClusterSR minimizes the cross-cluster repair traffic by carefully choosing the clusters for reading and repairing chunks. It further balances the cross-cluster repair traffic by scheduling the repair of multiple chunks. Large-scale simulation and Alibaba Cloud ECS experiments show that ClusterSR can reduce 6.7-52.7% of the cross-cluster repair traffic and improve 14.1-68.8% of the repair throughput.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"24 1","pages":"42-51"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS47924.2020.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
Erasure coding is a storage-efficient means to guarantee data reliability in today’s commodity storage systems, yet its repair performance is seriously hindered by the substantial repair traffic. Repair in clustered storage systems is even complicated because of the scarcity of the cross-cluster bandwidth. We present ClusterSR, a cluster-aware scattered repair approach. ClusterSR minimizes the cross-cluster repair traffic by carefully choosing the clusters for reading and repairing chunks. It further balances the cross-cluster repair traffic by scheduling the repair of multiple chunks. Large-scale simulation and Alibaba Cloud ECS experiments show that ClusterSR can reduce 6.7-52.7% of the cross-cluster repair traffic and improve 14.1-68.8% of the repair throughput.