{"title":"Zebra:用于分布式存储系统的需求感知擦除编码","authors":"Jun Li, Baochun Li","doi":"10.1109/IWQoS.2016.7590388","DOIUrl":null,"url":null,"abstract":"Erasure coding has been increasingly replacing replication in distributed storage systems, thanks to its lower storage overhead with the same level of failure tolerance. However, with lower storage overhead, the reconstruction overhead of erasure codes can increase significantly as well. Under the ever-changing workload, in which the data access can be highly skewed, it is difficult to achieve a well trade-off between the storage overhead and the reconstruction overhead. In this paper, we propose Zebra, a framework that encodes data into multiple tiers by their demand. Given the overall storage overhead and the number of failures to tolerate, Zebra determines the parameters of erasure coding in each tier by solving a geometric programming problem. Based on the demand of data, Zebra can dynamically assign data into the corresponding tiers to minimize the overall reconstruction overhead, and achieve a flexible tradeoff between the storage overhead and the reconstruction overhead in multiple tiers, such that hot data can enjoy less overhead of reconstruction and cold data can be stored with lower storage overhead. When demand changes, Zebra can adjust itself accordingly with a marginal amount of network transfer.","PeriodicalId":304978,"journal":{"name":"2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Zebra: Demand-aware erasure coding for distributed storage systems\",\"authors\":\"Jun Li, Baochun Li\",\"doi\":\"10.1109/IWQoS.2016.7590388\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Erasure coding has been increasingly replacing replication in distributed storage systems, thanks to its lower storage overhead with the same level of failure tolerance. However, with lower storage overhead, the reconstruction overhead of erasure codes can increase significantly as well. Under the ever-changing workload, in which the data access can be highly skewed, it is difficult to achieve a well trade-off between the storage overhead and the reconstruction overhead. In this paper, we propose Zebra, a framework that encodes data into multiple tiers by their demand. Given the overall storage overhead and the number of failures to tolerate, Zebra determines the parameters of erasure coding in each tier by solving a geometric programming problem. Based on the demand of data, Zebra can dynamically assign data into the corresponding tiers to minimize the overall reconstruction overhead, and achieve a flexible tradeoff between the storage overhead and the reconstruction overhead in multiple tiers, such that hot data can enjoy less overhead of reconstruction and cold data can be stored with lower storage overhead. When demand changes, Zebra can adjust itself accordingly with a marginal amount of network transfer.\",\"PeriodicalId\":304978,\"journal\":{\"name\":\"2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS)\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IWQoS.2016.7590388\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWQoS.2016.7590388","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Zebra: Demand-aware erasure coding for distributed storage systems
Erasure coding has been increasingly replacing replication in distributed storage systems, thanks to its lower storage overhead with the same level of failure tolerance. However, with lower storage overhead, the reconstruction overhead of erasure codes can increase significantly as well. Under the ever-changing workload, in which the data access can be highly skewed, it is difficult to achieve a well trade-off between the storage overhead and the reconstruction overhead. In this paper, we propose Zebra, a framework that encodes data into multiple tiers by their demand. Given the overall storage overhead and the number of failures to tolerate, Zebra determines the parameters of erasure coding in each tier by solving a geometric programming problem. Based on the demand of data, Zebra can dynamically assign data into the corresponding tiers to minimize the overall reconstruction overhead, and achieve a flexible tradeoff between the storage overhead and the reconstruction overhead in multiple tiers, such that hot data can enjoy less overhead of reconstruction and cold data can be stored with lower storage overhead. When demand changes, Zebra can adjust itself accordingly with a marginal amount of network transfer.