Zhipeng Li, Min Lv, Yinlong Xu, Yongkun Li, Liangliang Xu
{"title":"D3: Deterministic Data Distribution for Efficient Data Reconstruction in Erasure-Coded Distributed Storage Systems","authors":"Zhipeng Li, Min Lv, Yinlong Xu, Yongkun Li, Liangliang Xu","doi":"10.1109/IPDPS.2019.00064","DOIUrl":null,"url":null,"abstract":"Due to individual unreliable commodity components, failures are common in large-scale distributed storage systems. Erasure codes are widely deployed in practical storage systems to provide fault tolerance with low storage overhead. However, the commonly used random data placement in storage systems based on erasure codes induces to heavy cross-rack traffic, load imbalance, and random access, which slow down the recovery process upon failures. In this paper, with orthogonal arrays, we define a Deterministic Data Distribution (D^3) of blocks to nodes and racks, and propose an efficient failure recovery approach based on D^3. D^3 not only uniformly distributes data/parity blocks among storage servers, but also balances the repair traffic among racks and storage servers for failure recovery. Furthermore, D^3 also minimizes the cross-rack repair traffic for data layouts against a single rack failure and provides sequential access for failure recovery. We implement D3 in Hadoop Distributed File System (HDFS) with a cluster of 28 machines. Our experiments show that D^3 significantly speeds up the failure recovery process compared with random data distribution, e.g., 2.21 times for (6, 3)-RS code in a system consisting of eight racks and three nodes in each rack.","PeriodicalId":403406,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2019.00064","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Due to individual unreliable commodity components, failures are common in large-scale distributed storage systems. Erasure codes are widely deployed in practical storage systems to provide fault tolerance with low storage overhead. However, the commonly used random data placement in storage systems based on erasure codes induces to heavy cross-rack traffic, load imbalance, and random access, which slow down the recovery process upon failures. In this paper, with orthogonal arrays, we define a Deterministic Data Distribution (D^3) of blocks to nodes and racks, and propose an efficient failure recovery approach based on D^3. D^3 not only uniformly distributes data/parity blocks among storage servers, but also balances the repair traffic among racks and storage servers for failure recovery. Furthermore, D^3 also minimizes the cross-rack repair traffic for data layouts against a single rack failure and provides sequential access for failure recovery. We implement D3 in Hadoop Distributed File System (HDFS) with a cluster of 28 machines. Our experiments show that D^3 significantly speeds up the failure recovery process compared with random data distribution, e.g., 2.21 times for (6, 3)-RS code in a system consisting of eight racks and three nodes in each rack.